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Abstract 

We initiate the study of inverse problems in approximate uniform generation, focusing on uniform 
generation of satisfying assignments of various types of Boolean functions. In such an inverse problem, 
the algorithm is given uniform random satisfying assignments of an unknown function / belonging to 
a class C of Boolean functions (such as linear threshold functions or polynomial-size DNF formulas), 
and the goal is to output a probability distribution D which is e-close, in total variation distance, to the 
uniform distribution over / (1). Problems of this sort comprise a natural type of unsupervised learning 
problem in which the unknown distribution to be learned is the uniform distribution over satisfying 
assignments of an unknown function / £ C. 

Positive results: We prove a general positive result establishing sufficient conditions for efficient 
inverse approximate uniform generation for a class C. We define a new type of algorithm called a 
densifier for C, and show (roughly speaking) how to combine (i) a densifier, (ii) an approximate counting 
/ uniform generation algorithm, and (iii) a Statistical Query learning algorithm, to obtain an inverse 
approximate uniform generation algorithm. We apply this general result to obtain a poly(n, l/e)-time 
inverse approximate uniform generation algorithm for the class of ?i-variable linear threshold functions 
(halfspaces); and a quasipoly(n, l/e)-time inverse approximate uniform generation algorithm for the 
class of poly(n)-size DNF formulas. 

Negative results: We prove a general negative result establishing that the existence of certain types 
of signature schemes in cryptography implies the hardness of certain inverse approximate uniform gen- 
eration problems. We instantiate this negative result with known signature schemes from the cryp- 
tographic literature to prove (under a plausible cryptographic hardness assumption) that there are no 
subexponential-time inverse approximate uniform generation algorithms for 3-CNF formulas; for in- 
tersections of two halfspaces; for degree-2 polynomial threshold functions; and for monotone 2-CNF 
formulas. 

Finally, we show that there is no general relationship between the complexity of the "forward" ap- 
proximate uniform generation problem and the complexity of the inverse problem for a class C - it 
is possible for either one to be easy while the other is hard. In one direction, we show that the exis- 
tence of certain types of Message Authentication Codes (MACs) in cryptography implies the hardness 
of certain corresponding inverse approximate uniform generation problems, and we combine this gen- 
eral result with recent MAC constructions from the cryptographic literature to show (under a plausible 
cryptographic hardness assumption) that there is a class C for which the "forward" approximate uniform 
generation problem is easy but the inverse approximate uniform generation problem is computationally 
hard. In the other direction, we also show (assuming the GRAPH ISOMORPHISM problem is com- 
putationally hard) that there is a problem for which inverse approximate uniform generation is easy but 
"forward" approximate uniform generation is computationally hard. 
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1 Introduction 



The generation of (approximately) uniform random combinatorial objects has been an important research 
topic in theoretical computer science for several decades. In complexity theory, well-known results have 
related approximate uniform generation to other fundamental topics such as approximate counting and the 
power of nondeterminism [JS89l lJVV86IISJ89l Sip83[ISto83i On the algorithms side, celebrated algorithms 



have been given for a wide range of approximate uniform generation problems such as perfect matchings 
IIJSV0411 , graph colorings (see e.g. MJer95[ |Vig99], IHV031), satisfying assignments of DNF formulas MKL83I 



IJVV861IKLM89I , of linear threshold functions (i.e., knapsack instances) HMS041 [Dye03[ and more. 



Before describing the inverse problems that we consider, let us briefly recall the usual framework of 
approximate uniform generation. An approximate uniform generation problem is defined by a class C of 
combinatorial objects and a polynomial-time relation R(x,y) over C x {0, 1}*. An input instance of the 
problem is an object x € C, and the problem, roughly speaking, is to output an approximately uniformly ran- 
dom element y from the set R x := {y : R(x, y) holds}. Thus an algorithm A (which must be randomized) 
for the problem must have the property that for all x£C, the output distribution of A(x) puts approximately 
equal weight on every element of R x . For example, taking the class of combinatorial objects to be {all n x n 
bipartite graphs} and the polynomial-time relation R over (G, M) pairs to be "M is a perfect matching in 
G," the resulting approximate uniform generation problem is to generate an (approximately) uniform perfect 
matching in a given bipartite graph; a poly(n, log (1/e)) -time algorithm was given in HJSV04H . As another 
example, taking the combinatorial object to be a linear threshold function (LTF) f(x) = sign(u; • x — 0) 
mapping {—1, l} n — > {—1, 1} (represented as a vector (w±, . . . , w n , 9)) and the polynomial-time relation 
R over (/, x) to be "x is a satisfying assignment for /," we arrive at the problem of generating approxi- 
mately uniform satisfying assignments for an LTF (equivalently, feasible solutions to zero-one knapsack). 
A polynomial-time algorithm was given by [MS04] and a faster algorithm was subsequently proposed by 



[Dye03|. 



The focus of this paper is on inverse problems in approximate uniform generation. In such problems, 
instead of having to output (near-)uniform elements of R x , the input is a sample of elements drawn uniformly 
from R x , and the problem (roughly speaking) is to "reverse engineer" the sample and output a distribution 
which is close to the uniform distribution over R x . More precisely, following the above framework, a 
problem of this sort is again defined by a class C of combinatorial objects and a polynomial-time relation 
R. However, now an input instance of the problem is a sample {yi, . . . ,y m } of strings drawn uniformly at 
random from the set R x := {y : R{x,y) holds}, where now x € C is unknown. The goal is to output an 
e-sampler for R x , i.e., a randomized algorithm (which takes no input) whose output distribution is e-close 
in total variation distance to the uniform distribution over R x . Revisiting the first example from the previous 
paragraph, for the inverse problem the input would be a sample of uniformly random perfect matchings of 
an unknown bipartite graph G, and the problem is to output a sampler for the uniform distribution over all 
perfect matchings of G. For the inverse problem corresponding to the second example, the input is a sample 
of uniform random satisfying assignments of an unknown LTF over the Boolean hypercube, and the desired 
output is a sampler that generates approximately uniform random satisfying assignments of the LTF. 

Discussion. Before proceeding we briefly consider some possible alternate definitions of inverse approxi- 
mate uniform generation, and argue that our definition is the "right" one (we give a precise statement of our 
definition in Section|2l see Definition [TTb. 

One stronger possible notion of inverse approximate uniform generation would be that the output dis- 
tribution should be supported on R x and put nearly the same weight on every element of R x , instead of just 
being e-close to uniform over R x . However a moment's thought suggests that this notion is too strong, since 
it is impossible to efficiently achieve this strong guarantee even in simple settings. (Consider, for exam- 
ple, the problem of inverse approximate uniform generation of satisfying assignments for an unknown LTF. 
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Given access to uniform satisfying assignments of an LTF /, it is impossible to efficiently determine whether 
/ is (say) the majority function or an LTF that differs from majority on precisely one point in {—1, 1}" , and 
thus it is impossible to meet this strong guarantee.) 

Another possible definition of inverse approximate uniform generation would be to require that the 
algorithm output an e-approximation of the unknown object x instead of an e-sampler for R x . Such a 
proposed definition, though, leads immediately to the question of how one should measure the distance 
between a candidate object x' and the true "target" object x. The most obvious choice would seem to be the 
total variation distance between Ur x (the uniform distribution over R x ) and Ur ,; but given this distance 
measure, it seems most natural to require that the algorithm actually output an e-approximate sampler for 
R x . 

Inverse approximate uniform generation via reconstruction and sampling. While our ultimate goal, 
as described above, is to obtain algorithms that output a sampler, algorithms that attempt to reconstruct 
the unknown object x will also play an important role for us. Given C, R as above, we say that an (e, 5)- 

reconstruction algorithm is an algorithm ^reconstruct that works as follows: for any x G C, if ^reconstruct 
is given as input a sample of m = m(e, 5) i.i.d. draws from the uniform distribution over R x , then with 
probability 1 — 5 the output of ^reconstruct is an object x € C such that the variation distance dxv (Ur x > ) 
is at most e. (Note that the class C need not coincide with the original class C, so x need not necessarily 
belong to C.) With this notion in hand, an intuitively appealing schema for algorithms that solve inverse 
approximate uniform generation problems is to proceed in the following two stages: 

1. (Reconstruct the unknown object): Run a reconstruction algorithm ^reconstruct with accuracy and 
confidence parameters e/2, 5/2 to obtain x G C; 

2. (Sample from the reconstructed object): Let Sample be an algorithm which solves the approximate 
uniform generation problem (C, R) to accuracy e/2 with confidence 1 — 5/2. The desired sampler is 
the algorithm Sample with its input set to x. 

We refer to this as the standard approach for solving inverse approximate uniform generation problems. 
Most of our positive results for inverse approximate uniform generation can be viewed as following this 
approach, but we will see an interesting exception in Section |7J where we give an efficient algorithm for an 
inverse approximate uniform generation problem which does not follow the standard approach. 

1.1 Relation between inverse approximate uniform generation and other problems. Most of our re- 
sults will deal with uniform generation problems in which the class C of combinatorial objects is a class 
of syntactically defined Boolean functions over {—1, l} n (such as the class of all LTFs, all poly(n)-term 
DNF formulas, all 3-CNFs, etc.) and the polynomial-time relation R(f, y) for / £ C is "y is a satisfying 
assignment for /." In such cases our inverse approximate uniform generation problem can be naturally 
recast in the language of learning theory as an unsupervised learning problem (learning a probability distri- 
bution from a known class of possible target distributions): we are given access to samples from W^-im, 
the uniform distribution over satisfying assignments of / G C, and the task of the learner is to construct a 
hypothesis distribution D such that dxv(Wf-im, D) < e with high probability. We are not aware of prior 
work in unsupervised learning that focuses specifically on distribution learning problems of this sort (where 
the target distribution is uniform over the set of satisfying assignments of an unknown member of a known 
class of Boolean functions). 

Our framework also has some similarities to "uniform-distribution learning from positive examples 
only," since in both settings the input to the algorithm is a sample of points drawn uniformly at random from 
but there are several differences as well. One difference is that in uniform-distribution learning 
from positive examples the goal is to output a hypothesis function h, whereas here our goal is to output 
a hypothesis distribution (note that outputting a function h essentially corresponds to the reconstruction 
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problem described above). A more significant difference is that the success criterion for our framework is 
significantly more demanding than for uniform-distribution learning. In uniform-distribution learning of a 
Boolean function / over the hypercube {—1, l} n , the hypothesis h must satisfy Pr[h(x) ^ f(x)] < e, 
where the probability is uniform over all 2 n points in {—1,1}". Thus, for a given setting of the error pa- 
rameter e, in uniform-distribution learning the constant —1 function is an acceptable hypothesis for any 
function / that has |/ _1 (1)| < e2 n . In contrast, in our inverse approximate uniform generation framework 
we measure error by the total variation distance between Uf-in\ and the hypothesis distribution D, so no 
such "easy way out" is possible when |/ _1 (1)| is small; indeed the hardest instances of inverse approxi- 
mate uniform generation problems are often those for which / _1 (1) is a very small fraction of {—1, l} n . 
Essentially we require a hypothesis with small multiplicative error relative to \f~ 1 (l)\/2 n rather than the 
additive-error criterion that is standard in uniform-distribution learning. We are not aware of prior work on 
learning Boolean functions in which such a "multiplicative-error" criterion has been employed. 

We summarize the above discussion with the following observation, which essentially says that recon- 
struction algorithms directly yield uniform-distribution learning algorithms: 

Observation 1. Let C be a class of Boolean functions {—1, l} n — > {—1, 1} and let R(f, y) be the relation 
"y is a satisfying assignment for f." Suppose there exists a t(n,e,5)-time (e, ^-reconstruction algorithm 
for C that outputs elements of C. Then there is an (0(log(l/5)/e 2 ) + 0(t(n, e, 5/3) • \og(l / 8) /e)) -time 
uniform-distribution learning algorithm that outputs hypotheses in C (i.e., given access to uniform random 
labeled examples (x, f(x))for any f € C, the algorithm with probability 1 — 5 outputs a hypothesis h € C 
such that Pr[/i(i) / f(x)\ < e). 

Proof. The learning algorithm draws an initial set of 0(log(l/<5)/e 2 ) uniform labeled examples to esti- 
mate \f~ l (l)\/2 n to within an additive ±(e/4) with confidence 1 — 5/3. If the estimate is less than 3e/4 
the algorithm outputs the constant —1 hypothesis. Otherwise, by drawing 0(t(n,e, 5/3) • log(l/<5)/e)) 
uniform labeled examples, with failure probability at most 5/3 it can obtain t(n,e,5/3) positive exam- 
ples (i.e., points that are uniformly distributed over / _1 (1)). Finally the learning algorithm can use these 
points to run the reconstruction algorithm with parameters e,5/3 to obtain a hypothesis h G C that has 
dTv(^/- 1 (i) ) ^/fe- 1 (i)) < 6 with failure probability at most 5/3. Such a hypothesis h is easily seen to satisfy 
Pr[h(x) + f(x)] < e. □ 

As described in the following subsection, in this paper we prove negative results for the inverse ap- 
proximate uniform generation problem for classes such as 3CNF-formulas, monotone 2-CNF formulas, 
and degree-2 polynomial threshold functions. Since efficient uniform-distribution learning algorithms are 
known for these classes, these results show that the inverse approximate uniform generation problem is in- 
deed harder than standard uniform-distribution learning for some natural and interesting classes of functions. 

The problem of inverse approximate uniform generation is also somewhat reminiscent of the problem 
of reconstructing Markov Random Fields (MRFs) from random samples MBMS081 IDMR06I lMos071 . Much 
progress has been made on this problem over the past decade, especially when the hidden graph is a tree. 
However, there does not seem to be a concrete connection between this problem and the problems we study. 
One reason for this seems to be that in MRF reconstruction, the task is to reconstruct the model and not just 
the distribution; because of this, various conditions need to be imposed in order to guarantee the uniqueness 
of the underlying model given random samples from the distribution. In contrast, in our setting the explicit 
goal is to construct a high-accuracy distribution, and it may indeed be the case that there is no unique 
underlying model (i.e., Boolean function /) given the samples received from the distribution. 

1.2 Our results. We give a wide range of both positive and negative results for inverse approximate uni- 
form generation problems. As noted above, most of our results deal with uniform generation of satisfying 
assignments, i.e., C is a class of Boolean functions over {—1, l} n and for / 6 C the relation R(f, y) is "y 



3 



is a satisfying assignment for /." All the results, both positive and negative, that we present below are for 
problems of this sort unless indicated otherwise. 

Positive results: A general approach and its applications. We begin by presenting a general approach 
for obtaining inverse approximate uniform generation algorithms. This technique combines approximate 
uniform generation and counting algorithms and Statistical Query (SQ) learning algorithms with a new type 
of algorithm called a "densifier," which we introduce and define in Section [3] Very roughly speaking, the 
densifier lets us prune the entire space {—1, l} n to a set S which (essentially) contains all of and 
is not too much larger than (so is "dense" in S). By generating approximately uniform 

elements of S it is possible to run an SQ learning algorithm and obtain a high-accuracy hypothesis which 
can be used, in conjunction with the approximate uniform generator, to obtain a sampler for a distribution 
which is close to the uniform distribution over (The approximate counting algorithm is needed for 

technical reasons which we explain in Section 1370 ) In Section [3] we describe this technique in detail and 
prove a general result establishing its effectiveness. 

In Sections |4] and |5] we give two main applications of this general technique to specific classes of func- 
tions. The first of these is the class LTF of all LTFs over {— 1, l} n . Our main technical contribution here 
is to construct a densifier for LTFs; we do this by carefully combining known efficient online learning algo- 
rithms for LTFs (based on interior-point methods for linear programming) HMT941 with known algorithms 
for approximate uniform generation and counting of satisfying assignments of LTFs [M S041 Dye03 |. Given 



this densifier, our general approach yields the desired inverse approximate uniform generator for LTFs: 

Theorem 2. (Informal statement) There is a poly (ri, l/e)-time algorithm for the inverse problem of ap- 
proximately uniformly generating satisfying assignments for LTFs. 

Our second main positive result for a specific class, in Section|5] is for the well-studied class DNF n s of 
all size-s DNF formulas over n Boolean variables. Here our main technical contribution is to give a densifier 
which runs in time n°^ og ^ s ^ e ^ and outputs a DNF formula. A challenge here is that known SQ algorithms 
for learning DNF formulas require time exponential in n 1 / 3 . To get around this, we view the densifier's out- 
put DNF as an OR over n°( log ' s / e " "metavariables" (corresponding to all possible conjunctions that could 
be present in the DNF output by the densifier), and we show that it is possible to apply known malicious 
noise tolerant SQ algorithms for learning sparse disjunctions as the SQ-learning component of our general 
approach. Since efficient approximate uniform generation and approximate counting algorithms are known 
HJVV861 [KL831 for DNF formulas, with the above densifier and SQ learner we can carry out our general 
technique, and we thereby obtain our second main positive result for a specific function class: 

Theorem 3. (Informal statement) There is a n°^ og ( s ^ e ^ -time algorithm for the inverse problem of approx- 
imately uniformly generating satisfying assignments for s-term DNF formulas. 

Negative results based on cryptography. In light of the "standard approach," it is clear that in order for an 
inverse approximate uniform generation problem (C, R) to be computationally hard, it must be the case that 
either stage (1) (reconstructing the unknown object) or stage (2) (sampling from the reconstructed object) is 
hard. (If both stages have efficient algorithms A rcconstrU ct and ^4 sam pie respectively, then there is an efficient 
algorithm for the whole inverse approximate uniform generation problem that combines these algorithms 
according to the standard approach.) Our first approach to obtaining negative results can be used to obtain 
hardness results for problems for which stage (2), near-uniform sampling, is computationally hard. The 
approach is based on signature schemes from public -key cryptography; roughly speaking, the general result 
which we prove is the following (we note that the statement given below is a simplification of our actual 
result which omits several technical conditions; see Theorem l60l of Section loTTI for a precise statement): 

Theorem 4. (Informal statement) Let C be a class of functions such that there is a parsimonious reduction 
from CIRCUIT-SAT to C-SAT. Then known constructions of secure signature schemes imply that there is 
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no subexponential-time algorithm for the inverse problem of approximately uniformly generating satisfying 
assignments to functions in C. 

This theorem yields a wide range of hardness results for specific classes that show that our positive 
results (for LTFs and DNF) lie quite close to the boundary of what classes have efficient inverse approximate 
uniform generation algorithms. We prove: 

Corollary 5. (Informal statement) Under known constructions of secure signature schemes, there is no 
subexponential-time algorithm for the inverse approximate uniform generation problem for either of the 
following classes of functions: (i) 3 -CNF formulas; (ii) intersections of two half spaces. 

We show that our signature-scheme-based hardness approach can be extended to settings where there 
is no parsimonious reduction as described above. Using "blow-up"-type constructions of the sort used to 
prove hardness of approximate counting, we prove the following: 

Theorem 6. (Informal statement) Under the same assumptions as Corollary\5\ there is no subexponential- 
time algorithm for the inverse approximate uniform generation problem for either of the following classes: 
(i) monotone 2-CNF; (ii) degree-2 polynomial threshold functions. 

It is instructive to compare the above hardness results with the problem of uniform generation of NP- 
witnesses. In particular, while it is obvious that no efficient randomized algorithm can produce even a single 
satisfying assignment of a given 3-SAT instance (assuming NP % BPP), the seminal results of Jerrum 
et al. HJVV861 showed that given access to an NP-oracle, it is possible to generate approximately uniform 
satisfying assignments for a given 3-SAT instance. It is interesting to ask whether one requires the full 
power of adaptive access to NP-oracles for this task, or whether a weaker form of "advice" suffices. Our 
hardness results can be understood in this context as giving evidence that receiving polynomially many 
random satisfying assignments of a 3-SAT instance does not help in further uniform generation of satisfying 
assignments 

Our signature-scheme based approach cannot give hardness results for problems that have polynomial- 
time algorithms for the "forward" problem of sampling approximately uniform satisfying assignments. Our 
second approach to proving computational hardness can (at least sometimes) surmount this barrier. The 
approach is based on Message Authentication Codes in cryptography; the following is an informal statement 
of our general result along these lines (as before the following statement ignores some technical conditions; 
see Theorem [80] for a precise statement): 

Theorem 7. (Informal statement) There are known constructions ofMACs with the following property: 
Let C be a class of circuits such that the verification algorithm of the MAC can be implemented in C. Then 
there is no subexponential-time inverse approximate uniform generation algorithm for C. 

We instantiate this general result with a specific construction of a MAC that is a slight variant of a 
construction due to Pietrzak MPiel2ll . This specific construction yields a class C for which the "forward" 
approximate uniform generation problem is computationally easy, but (under a plausible computational 
hardness assumption) the inverse approximate uniform generation problem is computationally hard. 

The above construction based on MACs shows that there are problems [C, R) for which the inverse 
approximate uniform generation problem is computationally hard although the "forward" approximate uni- 
form generation problem is easy. As our last result, we exhibit a group-theoretic problem (based on graph 
automorphisms) for which the reverse situation holds: under a plausible hardness assumption the forward 

1 There is a small caveat here in that we are not given the 3-SAT formula per se but rather access to random satisfying assignments 
of the formula. However, there is a simple elimination based algorithm to reconstruct a high-accuracy approximation for a 3-SAT 
formula if we have access to random satisfying assignments for the formula. 
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approximate uniform generation problem is computationally hard, but we give an efficient algorithm for 
the inverse approximate uniform generation problem (which does not follow our general technique or the 
"standard approach"). 



Structure of this paper. After the preliminaries in Section [2 we present in Section [3] our general upper 
bound technique. In Sections|4]and|5]we apply this technique to obtain efficient inverse approximate uniform 
generation algorithms for LTFs and DNFs respectively. Section|6]contains our hardness results. In Section|7] 
we give an example of a problem for which approximate uniform generation is hard, while the inverse 
problem is easy. Finally, in Section [8] we conclude the paper suggesting further directions for future work. 



2 Preliminaries and Useful Tools 

2.1 Notation and Definitions. For n G Z+, we will denote by [n] the set {1, ... , n}. For a distribution D 
over a finite set W we denote by D(x), x G W, the probability mass that D assigns to point x, so D(x) > 
and Yl,xe.w D(x) = 1. For S C W, we write D(S) to denote YlxeS D(x). For a finite set X we write 
x Gt; X to indicate that x is chosen uniformly at random from X. For a random variable x, we will write 
x ~ D to denote that x follows distribution D. Let D,D' be distributions over W. The total variation 

def 

distance between D and D' is cItv(D,D') = max S cw \D(S) - D'(S)\ = (1/2) • \\D - D'\\i, where 
\\D — D'\\i = J2 x eW — D'(x)\ is the Li-distance between D and D'. 

We will denote by C n , or simply C, a Boolean concept class, i.e., a class of functions mapping {—1, l} n 
to { — 1, 1}. We usually consider syntactically defined classes of functions such as the class of all n-variable 
linear threshold functions or the class of all n-variable s-term DNF formulas. We stress that throughout this 
paper a class C is viewed as a representation class. Thus we will say that an algorithm "takes as input a 
function / G C" to mean that the input of the algorithm is a representation of / G C. 

We will use the notation U n (or simply U, when the dimension n is clear from the context) for the 
uniform distribution over {—1, l} n . Let / : {—1, l} n — > {—1, 1}- We will denote by Uf-in\ the uniform 
distribution over satisfying assignments of /. Let D be a distribution over {—1, 1}™ with < D{f^ 1 {l)) < 
1. We write Df + to denote the conditional distribution D restricted to / _1 (1); so for x € we have 

Df i+ (x) = D(x)/D(f~ 1 (l)). Observe that, with this notation, we have that Uf-iru = Mf,+- 

We proceed to define the notions of approximate counting and approximate uniform generation for a 
class of Boolean functions: 

Definition 8 (approximate counting). Let C be a class of n-variable Boolean functions. A randomized 
algorithm -Amount 15 an efficient approximate counting algorithm for class C, if for any e, 6 > and any 
f € C, on input e, 5 and f £ C, it runs in time poly(n, 1/e, log(l/5)) and with probability 1 — 5 outputs a 
value p such that 

■ Pr x ^u[f(x) = 1} < p< (1 + e) ■ Pr^[/(x) = 1]. 

(1 + e) 

Definition 9 (approximate uniform generation). Let C be a class of n-variable Boolean functions. A ran- 
domized algorithm A^ cn is an efficient approximate uniform generation algorithm for class C, if for any 
e > and any f € C, there is a distribution D = Df € supported on f~ 1 (l) with 

< D(x) < (1 + e) 



1 + 6 - v-v ' > 

for each x € / _1 (1), such that for any 5 > 0, on input e,5 and f € C, algorithm A^ en (e, 5, /) runs i, 



time poly(ra, 1/e, log(l/<5)) and either outputs a point x G / 1 (1) that is distributed precisely according to 
D = Df e , or outputs _L. Moreover the probability that it outputs _L is at most 5. 
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An approximate uniform generation algorithm is said to be fully polynomial if its running time depen- 
dence on eis poly (log (1/e)). 

Before we define our inverse approximate uniform generation problem, we need the notion of a sampler 
for a distribution: 

Definition 10. Let D be a distribution over {—1, l} n . A sampler for D is a circuit C with m = poly(n) 
input bits z G {— 1, l} m and n output bits x G {— 1, l} n which is such that when z ~ hi m then x ~ D. For 
e > 0, an e-sampler for D is a sampler for some distribution D' which has d<rv(D' , D) < e. 

For clarity we sometimes write "C is a 0-sampler for D" to emphasize the fact that the outputs of C(z) 
are distributed exactly according to distribution D. We are now ready to formally define the notion of an 
inverse approximate uniform generation algorithm: 

Definition 11 (inverse approximate uniform generation). Let C be a class of n-variable Boolean functions. 
A randomized algorithm Af nv is an inverse approximate uniform generation algorithm for class C, if for any 
e, 5 > and any f G C, on input e, 5 and sample access to U with probability 1 — 5 algorithm Af nv 
outputs an e-sampler CfforUf-i^y 

2.2 Hypothesis Testing. Our general approach works by generating a collection of hypothesis distribu- 
tions, one of which is close to the target distribution Uf-i^y Thus, we need a way to select a high-accuracy 
hypothesis distribution from a pool of candidate distributions which contains at least one high-accuracy hy- 
pothesis. This problem has been well studied, see e.g. Chapter 7 of MDLOlll . We use the following result 
which is an extension of Lemma C.l of BDDS12all . 

Proposition 12. Let D be a distribution over a finite set W and T> e = be a collection of N 

distributions over W with the property that there exists i G [N] such that d<r:v(D, Di) < e. There is an 
algorithm T D , which is given access to: 

(i) samplers for D and D^, for all k G [N], 

(ii) a (1 + P)- approximate evaluation oracle EVAL£> fe (/3), for all k G [N], which, on input w G W, 
deterministically outputs a value D^(w), such that Dk(w)/(1 + /3) < D^(w) < (1 + f3)D]~(w), 
where f3 > is any parameter satisfying (l + /3) 2 <l + e/8, 

an accuracy parameter e and a confidence parameter 5, and has the following behavior: It makes 

m = ((1/e 2 ) • (log N + log(l/<5))) 

draws from D and from each Dk, k G [N], and 0(m) calls to each oracle EVAL£) fe (/3), k G [N], per- 
forms 0(mN 2 ) arithmetic operations, and with probability 1 — 5 outputs an index i* G [N] that satisfies 
d r v(D,Di*) < 6c. 

Before we proceed with the proof, we note that there are certain crucial differences between the current 
setting and the setting of HDDS12al IDDS12bt (as well as other related works that use versions of Proposi- 
tion[T2l. In particular, in our setting, the set W is of size 2 n , which was not the case in HDDS 1 2al IDDS 1 2bl . 
Hence, we cannot assume the distributions Di are given explicitly in the input. Thus Proposition [12] care- 
fully specifies what kind of access to these distributions is required. Proposition [12] is an extension of 
similar results in the previous works; while the idea of the proof is essentially the same, the details are more 
involved. 
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Proof of Proposition\F2\ At a high level, the algorithm T D performs a tournament by running a "compe- 
tition" Choose-Hypothesis D for every pair of distinct distributions in the collection V e . It outputs a 
distribution D* € V e that was never a loser (i.e., won or achieved a draw in all its competitions). If no 
such distribution exists in V e then the algorithm outputs "failure." We start by describing and analyzing the 
competition subroutine between a pair of distributions in the collection. 

Lemma 13. In the context of Proposition 172] there is an algorithm Choose-Hypothesis^Dj, Dj,e', 5') 
which is given access to 

(i) independent samples from D and D\~,for k € {i, j}, 

(ii) an evaluation oracle EVAL^ (/3), for k € {i, j}, 

an accuracy parameter e' and a confidence parameter 5', and has the following behavior: It uses m' = 
O ^(l/e /2 ) log(l/5')^ samples from each ofD, Di and Dj, it makes O(m') calls to the oracles EVAL£> fe (/3), 
k £ {i, j}, performs O(m') arithmetic operations, and if some D^, k € has d.Tv(Dk, D) < e' then 

with probability 1 — 5' it outputs an index k* € {i,j} that satisfies d-rv(D, Dk*) < 6e'. 

Proof. To set up the competition between Di and Dj, we consider the following subset of W: 

Ha = Hij{D u Dj) = {weW\ Di(w) > D 3 (w)} 

def def 

and the corresponding probabilities pij = Di(Hij) and qij = Dj(Hij). Clearly, it holds p,ij > q^j and by 
definition of the total variation distance we can write 

d T v(Di, Dj) = pij - qij. 

For the purposes of our algorithm, we would ideally want oracle access to the set Hij. Unfortunately 
though, this is not possible since the evaluation oracles are only approximate. Hence, we will need to define 
a more robust version of the set Hy which will turn out to have similar properties. In particular, we consider 
the set 

flg = {w G W | Dl(w) > D?(w)} 

and the corresponding probabilities p^. = f Di(H^) and qf?- = f Dj(H^). We claim that the difference 
A = f pfj — qfj is an accurate approximation to d^y(Di, Dj). In particular, we show: 
Claim 14. We have 

A < dxv(Ai Dj) < A + e/4. (1) 

Before we proceed with the proof, we stress that (Q]) crucially uses our assumption that the evaluation 
oracles provide a multiplicative approximation to the exact probabilities. 

Proof. To show © we proceed as follows: Let A = Hij n H^, B = Hij n and C = ~B~j n H^. Then 
we can write 

<hv{Di,Dj) = (Di - Dj)(A) + (Di - D 3 )(B) 

and 

A = (D i -D j )(A) + (D i -D j )(C). 

We will show that 

0< (Di-Dj)(B) <e/8 (2) 
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and similarly 

-e/8<(A-£>i)(C)<0 (3) 

from which the claim follows. We proceed to prove (O, the proof of © being very similar. Let w € B. Then 
Di(w) > Dj(w) (since w <E Bif) which gives (A - Dj)(B) > 0, establishing the LHS of ©. We now 

establish the RHS. For w G B we also have that D^(w) < Dj(w) (since w € Now by the definition 

of the evaluation oracles, it follows that D?(w) > jj+m and Dj(w) < (1 + (3)Dj(w). Combining these 
inequalities yields 

AO) < (1 + P) 2 Dj{w) < (1 + efflDjtw) 
where the second inequality follows by our choice of /3. Therefore, 

(A - £>i)(B) = £ (AH " A/W) < (e/8) • Dj(B) < e/8 

as desired. □ 

Note that the probabilities and g^- are not available to us explicitly. Hence, Choose-Hypothesis 
requires a way to empirically estimate each of these probability values (up to a small additive accuracy). 
This task can be done efficiently because we have sample access to the distributions Di,Dj and oracle 
access to the set thanks to the EVAL£) fc (/3) oracles. The following claim provides the details: 

Claim 15. There exists a subroutine Est imate(A 5 H^, 7, 5) which is given access to 

(i) independent samples from A. 

(ii) an evaluation oracle EVALj) fc (/3),/or k € {i, j}, 

an accuracy parameter 7 and a confidence parameter 5, and has the following behavior: It makes m = 
O ((I/7 2 ) log(l/<5)) draws from Di and 0(m) calls to the oracles EVAL^ (/3), k = i,j, performs 0(m) 
arithmetic operations, and with probability 1 — 5 outputs a number pf ■ such that ■ — p^-\ < 7. 

Proof. The desired subroutine amounts to a straightforward random sampling procedure, which we include 
here for the sake of completeness. We will use the following elementary fact, a simple consequence of the 
additive Chernoff bound. 

Fact 16. Let X be a random variable taking values in the range [—1,1]. Then E[A] can be estimated to 
within an additive dbr, with confidence probability 1 — 5, using m = f2((l/r 2 ) log (1/5)) independent sam- 
ples from X. In particular, the empirical average X m = (1/m) YliLi where the Xi's are independent 
samples of X, satisfies Pr \X m — E[X]| < r > 1 — 5. 

We shall refer to this as "empirically estimating" the value of E[X]. 

Consider the indicator function I„p of the set H^, i.e., I„p : W — > {0, 1} with I„p (x) = 1 if and 

ij 3 ij ij 



only if x € H^. It is clear that E x .^_Di I H P (%) = A(-f^w) = Pi j- The subroutine is described in the 

L ij J 

following pseudocode: 



Subroutine Estimate(A, H^, 7, 5): 

Input: Sample access to A an d oracle access to EVALD fe (/3), k = 

Output: A number p?- such that with probability 1 — 5 it holds — A(-H"£-)l — 7- 
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1. Draw m = ((I/7 2 ) log(l/5)) samples s = {s£}JL 1 from D^ 

2. For each sample sg, £ € [m\. 

(a) Use the oracles EVAL^ EVAL^. (/3), to approximately evaluate Di(si), Dj(se). 

(b) If Sf (s £ ) > D?(s t ) set I s (si) = 1, otherwise J p (si) = 0. 

4. Output jfiij. 



The computational efficiency of this simple random sampling procedure follows from the fact that we 
can efficiently decide membership in H^. To do this, for a given x G W, we make a query to each of the 

oracles EVAL A (/3), E\fAL Dj (p) to obtain the probabilities D?(x), D?(x). We have that x G (or 

equivalently I R p (x) = 1) if and only if D?(x) > D?(x). By FactfTol, applied for the random variable 

ij 3 

IjtP (x), where x ~ Di, after m = Q((l/^ 2 ) log(l/<5)) samples from Di we obtain a ±7-additive estimate 

ij 

to pPj with probability 1 — 5. For each sample, we make one query to each of the oracles, hence the total 
number of oracle queries is 0(m) as desired. The only non-trivial arithmetic operations are the 0(m) 
comparisons done in Step 2(b), and Claim[l5]is proved. □ 

Now we are ready to prove Lemma [T3l The algorithm Choose-Hypothesis Z) (Z?i,L'j,e / ,y) per- 
forming the competition between Di and Dj is the following: 

Algorithm Choose-Hypothesis^-Dj, Dj, e' , 5'): 

Input: Sample access to D and D^, k = i,j, oracle access to EVAL/^ k = 

1. Setpfj =Estimate(D i ,H? j ,e'/8,5'/4) and q{j =Estimate(J9 i , fig, e'/8, 6'/4). 

2. If ■ — < 9e'/2, declare a draw and return either i or j. Otherwise: 

3. Draw m' = G ((l/e /2 ) log(l/«J')) samples s' = {st}^ from D. 

4. For each sample sg, £ G [m'\. 

(a) Use the oracles EVAL Di (/?), EVAL Dj (/?) to evaluate 5f (si), Dj(si). 

(b) If Sf (at) > £>P(s e ) set I /, (s e ) = 1, otherwise J p (s e ) = 0. 

J ij ij 

5. Set r = ^7 XX=i ^wf ( s ^)> i- e -> T i s tne fraction of samples that fall inside H^-. 

ij 

6. If r > pf?j — ^e', declaim Dj as winner and return i; otherwise, 

7. ifr <^ + f e', declared , as winner and return j; otherwise, 

8. declare a draw and return either i or j. 
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It is not hard to check that the outcome of the competition does not depend on the ordering of the pair 
of distributions provided in the input; that is, on inputs (Di, Dj) and (Dj,Di) the competition outputs the 
same result for a fixed set of samples {s\, . . . , s m i} drawn from D. 

The upper bounds on sample complexity, query complexity and number of arithmetic operations can 
be straightforwardly verified. Hence, it remains to show correctness. By Claim Q3] and a union bound, 
with probability at least 1 — 5'/2, we will have that ■ — pfj\ < e'/8 and \q^ ■ — g^ .| < e'/8. In 
the following, we condition on this good event. The correctness of Choose-Hypothesis is then an 
immediate consequence of the following claim. 

Claim 17. Suppose that <Itv{D, Di) < e'. Then: 

(i) Ifd^viD, Dj) > 6e', then the probability that the competition between Di and Dj does not declare 
Di as the winner is at most e -m ' e ' 2 / 8 . (Intuitively, if Dj is very far from D then it is very likely that 
Di will be declared winner. ) 

( ii) The probability that the competition between Di and Dj declares Dj as the winner is at most e - m ' e ' 2 / s _ 
(Intuitively, since Di is close to D, a draw with some other Dj is possible, but it is very unlikely that 
Dj will be declared winner. ) 

Proof. Let r 13 = D(H^). The definition of the variation distance implies that \r@ — A < dTy(D, Di) < 



e'. Therefore, we have that \r@ — .| < \r@ — p^ -\ + \p^ ■ — p^ -\ < 9e'/8. Consider the indicator (0/1) 
random variables defined as Zi = 1 if and only if sg £ H?-. Clearly, r = YltLi %l an d 

E s / [r] = E Sf [Zi] = r 13 . Since the Zi's are mutually independent, it follows from the Chernoff bound that 
Pr[r < r 13 - e'/2] < e - m ' e ' 2 / 8 . Using -p^J < 9e'/8. we get that Pr[r < ^ . - 13e'/8] < e" m ' e ' 2 / 8 . 

• Forpart(i): lfdTv(D,Dj) > 6e', from the triangle inequality we get that Pi,j—qi,j = dTv(Di, Dj) > 
5e' Claim [T4l implies that p^ ■ — ■ > 19e'/4 and our conditioning finally gives • — qfj > 9e'/2. 
Hence, the algorithm will go beyond Step 2, and with probability at least 1 — e - m '< E ' 2 / 8 ; it will stop at 
Step 6, declaring Di as the winner of the competition between Di and Dj. 

• For part (ii): If p?j — qfj < 9e'/2 then the competition declares a draw, hence Dj is not the winner. 
Otherwise we have p^- — q?- > 9e'/2 and the argument of the previous paragraph implies that the 
competition between Di and Dj will declare Dj as the winner with probability at most e - m ' e ' 2 / 8 . 

This concludes the proof of Claim [T71 □ 

This completes the proof of Lemma [l3j □ 

We now proceed to describe the algorithm T D and establish Proposition [12] The algorithm performs 
a tournament by running the competition Choose-Hypothesis I? (I?j, Dj, e, 5/(2N)) for every pair of 
distinct distributions Di, Dj in the collection V e . It outputs a distribution D* € V e that was never a loser 
(i.e., won or achieved a draw in all its competitions). If no such distribution exists in T> e then the algorithm 
outputs "failure." A detailed pseudocode follows: 



Algorithm T D ({Dj}f =1 , e, 5): 

Input: Sample access to D and D^, k € [N], and oracle access to EVAL£) fe , k € [N]. 
1. Draw m = ((l/e 2 )(log N + log (1/5))) samples from D and each D k ,k£ [N], 
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2. For all i,j E [N], i ^ j, run Choose-Hypothesis (Dj, Dj, e, 5/ (2N)) using this sample. 

3. Output an index i* such that Di* was never declared a loser, if one exists. 

4. Otherwise, output "failure". 



We now proceed to analyze the algorithm. The bounds on the sample complexity, running time and 
query complexity to the evaluation oracles follow from the corresponding bounds for Choose-Hypothesi s. 
Hence, it suffices to show correctness. We do this below. 

By definition, there exists some Di € V e such that d^y(D, Di) < e. By ClaimQ/TJ the distribution Di 
never loses a competition against any other Dj G V e (so the algorithm does not output "failure"). A union 
bound over all N distributions in T> e shows that with probability 1 — 5/2, the distribution D 1 never loses a 
competition. 

We next argue that with probability at least 1 — 5/2, every distribution Dj G V e that never loses has 
small variation distance from D. Fix a distribution Dj such that drv(Dj, D) > 6e; Claim fTTl i) implies that 
Dj loses to Di with probability 1 - 2e~ me2 / 8 > 1 - 5/{2N). A union bound yields that with probability 
1 — 5/2, every distribution Dj that has di\(Dj,D) > 6e loses some competition. 

Thus, with overall probability at least 1 — 5, the tournament does not output "failure" and outputs some 
distribution D* such that d^\{D, D*) is at most 6e. The proof of Proposition [T2l is now complete. □ 

Remark 18. As stated Proposition [12] assumes that algorithm T D has access to samplers for all the dis- 
tributions Dj,, so each call to such a sampler is guaranteed to output an element distributed according to 
D^. Let Dj: be a distribution over W U {_L} which is such that (i) D^(_L) < 1/2, and (ii) the conditional 
distribution (Du)w °f Dh conditioned on not outputting _L is precisely D^. It is easy to see that the proof 
of Proposition [12] extends to a setting in which T D has access to samplers for Dj: rather than samplers for 
Dk ; each time a sample from Dk is required the algorithm can simply invoke the sampler for Dj: repeatedly 
until an element other than _L is obtained. (The low-probability event that many repetitions are ever needed 
can be "folded into" the failure probability 5.) 

3 A general technique for inverse approximate uniform generation 

In this section we present a general technique for solving inverse approximate uniform generation problems. 
Our main positive results follow this conceptual framework. At the heart of our approach is a new type of 
algorithm which we call a densifier for a concept class C. Roughly speaking, this is an algorithm which, 
given uniform random positive examples of an unknown / € C, constructs a set S which (essentially) 
contains all of / _1 (1) and which is such that is "dense" in S. Our main result in this section, 

Theorem [21] states (roughly speaking) that the existence of (i) a computationally efficient densifier, (ii) an 
efficient approximate uniform generation algorithm, (iii) an efficient approximate counting algorithm, and 
(iv) an efficient statistical query (SQ) learning algorithm, together suffice to yield an efficient algorithm for 
our inverse approximate uniform generation problem. 

We have already defined approximate uniform generation and approximate counting algorithms, so 
we need to define SQ learning algorithms and densifiers. The statistical query (SQ) learning model is a 
natural restriction of the PAC learning model in which a learning algorithm is allowed to obtain estimates 
of statistical properties of the examples but cannot directly access the examples themselves. Let D be a 
distribution over {—1,1}™. In the SQ model [Kea98 ], the learning algorithm has access to a statistical query 
oracle, STAT(/, D), to which it can make a query of the form (x, r), where x '■ {— 1) l} n x { — 1) 1} 
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[—1,1] is the query function and r > is the tolerance. The oracle responds with a value v such that 
\Ei x ~d [x { x 7 fi x ))] ~ v \ < r > where / G C is the target concept. The goal of the algorithm is to output 
a hypothesis h : {—1, l} n — > {—1, 1} such that Pr x ^£)[h(x) ^ f(x)] < e. The following is a precise 
definition: 

Definition 19. Le? C be a class of n-variable boolean functions and D be a distribution over {— 1, l} n . An 
SQ learning algorithm for C under D is a randomized algorithm A%q that for every e, 5 > 0, every target 
concept f € C, on input e, 5 and with access to oracle STAT(/, D) and to independent samples drawn from 
D, outputs with probability 1 — 5 a hypothesis h : { — 1, l} n — > { — 1, 1} such that Pr Xr ^r>[h(x) ^ f(x)] < e. 
Let ti(n, 1/e, 1/8) be the running time of A§q (assuming each oracle query is answered in unit time), t2(n) 
be the maximum running time to evaluate any query provided to STAT(/, D) and r (n, 1/e) be the minimum 
value of the tolerance parameter ever provided to STAT(/, D) in the course of A^q's execution. We say 
that A§q is efficient ( and that C is efficiently SQ learnable with respect to distribution D ), ift\ (n, 1/e, 1/ 8) 
is polynomial in n, 1/e and 1/5, t2(n) is polynomial in n and r(n, 1/e) is lower bounded by an inverse 
polynomial in n and 1 /e. We call an SQ learning algorithm ^4gQ for C distribution independent if A§q 
succeeds for any distribution D.IfC has an efficient distribution independent SQ learning algorithm we 
say that C is efficiently SQ learnable (distribution independently). 

We sometimes write an "(e, <5)-SQ learning algorithm" to explicitly state the accuracy parameter e and 
confidence parameter Throughout this paper, we will only deal with distribution independent SQ learning 
algorithms. 

To state our main result, we introduce the notion of a densifier for a class C of Boolean functions. 
Intuitively, a densifier is an algorithm which is given access to samples from Uf-iM\ (where / is an unknown 
element of C) and outputs a subset S C {—1, l} n which is such that (i) S contains "almost all" of /~ 1 (1), 
but (ii) S is "much smaller" than {— 1, l} n - in particular it is small enough that / _1 (1) n S is (at least 
moderately) "dense" in S. 

Definition 20. Fix a function j(n, 1/e, 1/8) taking values in (0, 1] and a class C of n-variable Boolean 

(C C f ) 

functions. An algorithm A d ' is said to be a 7-densifier for function class C using class C if it has the 
following behavior: For every e,8 > 0, every 1 /2 n < p < 1, and every f G C, given as input e,8,p and a set 

of independent samples from Uf-iny the following holds: Let p=^ ~Pr xr ^u n [f(x) = lj-Ifp <p< (l + e)p, 

then with probability at least 1 — 8, algorithm A\J outputs a function g G C such that: 

(a) Pr^. 1(1) [g(x) = 1] > 1 - e. 

(b) Pr^ s _ 1(1) lf(x) = 1] > 7 (n, 1/e, 1/S). 

We will sometimes write an "(e, 7, 5)-densifier" to explicitly state the parameters in the definition. 
Our main conceptual approach is summarized in the following theorem: 

Theorem 21 (General Upper Bound). Let C,C be classes of n-variable boolean functions. Suppose that 

• *4jgn ^ is an (e, 7, 5)-densifier for C using C' running in time T^ cn (n, 1/e, 1/ 5). 

• *4gg n is an (e, 5) -approximate uniform generation algorithm for C running in time T gcn (n, 1 /e, 1 /5). 

• ^count ^ 5) -approximate counting algorithm for C running in time ^c 0un ^ (n, 1/e, 1/8). 

• AgQ is an (e, 5)-SQ learning algorithm for C such that: A§q runs in time t\(n, 1/e, 1/5) , t2(n) is the 
maximum time needed to evaluate any query provided to STAT(/, D), and r(n, 1/e) is the minimum 
value of the tolerance parameter ever provided to STAT(/, D) in the course of A§q 's execution. 
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Then there exists an inverse approximate uniform generation algorithm A^ nv for C. The running time of 
Af nv is polynomial in T den (n, 1/e, 1/8), I/7, T gcn (n, 1/e, 1/5), T count (n, 1/e, 1/5), t\{n, 1/e, 1/5), i 2 (n) 
andl/T(n,l/e). 

Sketch of the algorithm. The inverse approximate uniform generation algorithm Af nv for C works in three 
main conceptual steps. Let / G C be the unknown target function and recall that our algorithm Af nv is given 
access to samples from Uf-iny 

(1) In the first step, Af nv runs the densifier ' on a set of samples from Uf-iny Let g £ C be, the 

(C C) 

output function of A df J n . 

Note that by setting the input to the approximate uniform generation algorithm A^ cn to g, we obtain an 
approximate sampler C g for U g -iny The output distribution D' of this sampler, is by definition supported 
on g~ l {l) and is close to D = li g -in\ in total variation distance. 

(2) The second step is to run the SQ-algorithm A§q to learn the function / 6 C under the distribution D. 
Let h be the hypothesis constructed by *4sq- 

(3) In the third and final step, the algorithm simply samples from C g until it obtains an example x that 
has h(x) = 1, and outputs this x. 

Remark 22. The reader may have noticed that the above sketch does not seem to use the approximate 
counting algorithm A^ onnt ; we will revisit this point below. 

Remark 23. The connection between the above algorithm sketch and the "standard approach" discussed in 
the Introduction is as follows: The function g /\ h essentially corresponds to the reconstructed object x of 
the "standard approach." The process of sampling from C g and doing rejection sampling until an input that 
satisfies h is obtained, essentially corresponds to the Sample procedure of the "standard approach." 

3.1 Intuition, motivation and discussion. To motivate the high-level idea behind our algorithm, consider 
a setting in which / _1 (1) is only a tiny fraction (say l/2 e ( n )) of {—1, l} n . It is intuitively clear that we 
would like to use some kind of a learning algorithm in order to come up with a good approximation of 
/ -1 (1), but we need this approximation to be accurate at the "scale" of / -1 (1) itself rather than at the scale 
of all of {—1, l} n , so we need some way to ensure that the learning algorithm's hypothesis is accurate at this 
small scale. By using a densifier to construct g such that <? _1 (1) is not too much larger than /~ 1 (1), we can 
use the distribution D = U g -i^ to run a learning algorithm and obtain a good approximation of / _1 (1) at 
the desired scale. (Since D and D' are close in variation distance, this implies we also learn / with respect 
to D' .) 

To motivate our use of an SQ learning algorithm rather than a standard PAC learning algorithm, observe 
that there seems to be no way to obtain correctly labeled examples distributed according to D. However, 
we show that it is possible to accurately simulate statistical queries under D having access only to random 
positive examples from / _1 (1) and to unlabeled examples drawn from D (subject to additional technical 
caveats discussed below). We discuss the issue of how it is possible to successfully use an SQ learner in our 
setting in more detail below. 

Discussion and implementation issues. While the three main conceptual steps (l)-(3) of our algorithm 
may (hopefully) seem quite intuitive in light of the preceding motivation, a few issues immediately arise 
in thinking about how to implement these steps. The first one concerns running the SQ-algorithm ^4gq in 

2 It is straightforward to derive an explicit running time bound for Af nv in terms of the above functions from our analysis, but 
the resulting expression is extremely long and rather uninformative so we do not provide it. 
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Step 2 to learn / under distribution D (recall that D = U g -in) and is close to D'). Our algorithm Af QV 
needs to be able to efficiently simulate A§q given its available information. While it would be easy to do 
so given access to random labeled examples (x, f(x)), where x ~ D, such information is not available in 
our setting. To overcome this obstacle, we show (see Proposition |25"T) that for any samplable distribution D, 
we can efficiently simulate a statistical query algorithm under D using samples from Df :+ . This does not 
quite solve the problem, since we only have samples from Uf-iny However, we show (see Claim |28T) that 
for our setting, i.e., for D = W g -im, we can simulate a sample from Df :+ by a simple rejection sampling 
procedure using samples from Uf-if^ and query access to g. 

Some more issues remain to be handled. First, the simulation of the statistical query algorithm sketched 
in the previous paragraph only works under the assumption that we are given a sufficiently accurate approx- 
imation bf of the probability Pr Xr ^D[f(%) = !]• (Intuitively, our approximation should be smaller than 
the smallest tolerance r provided to the statistical query oracle by the algorithm A%q.) Second, by Defini- 
tion|20l the densifier only succeeds under the assumption that it is given in its input an (1 + e)-multiplicative 
approximation p to p = Pr xeUn [f(x) = I). 

We handle these issues as follows: First, we show (see Claim |29l) that, given an accurate estimate p and 

a "dense" function g € C, we can use the approximate counting algorithm -Amount to efficiently compute an 

accurate estimate bf. (This is one reason why Theorem |2T] requires an approximate counting algorithm for 

C .) To deal with the fact that we do not a priori have an accurate estimate p, we run our sketched algorithm 

for all possible values of Pr x ^u n {f(x) = 1] in an appropriate multiplicative "grid" of size N = 0(n/e), 

covering all possible values from l/2 n to 1. We thus obtain a set V of N candidate distributions one of 

which is guaranteed to be close to the true distribution Uf-in\ in variation distance. At this point, we would 

like to apply our hypothesis testing machinery (Proposition fT2l) to find such a distribution. However, in 

order to use Proposition [l2j in addition to sample access to the candidate distributions (and the distribution 

being learned), we also require a multiplicatively accurate approximate evaluation oracle to evaluate the 

probability mass of any point under the candidate distributions. We show (see Lemma l39l) that this is 

possible in our generic setting, using properties of the densifier and the approximate counting algorithm 
aC f nr r» 

•^count 1U1 ° ■ 

Now we are ready to begin the detailed proof of Theorem [2T1 

3.2 Simulating statistical query algorithms. Our algorithm Af nv will need to simulate a statistical query 
algorithm for C, with respect to a specific distribution D. Note, however that A mv only has access to 
uniform positive examples of / € C, i.e., samples from Uf-im. Hence we need to show that a statistical 
query algorithm can be efficiently simulated in such a setting. To do this it suffices to show that one can 
efficiently provide valid responses to queries to the statistical query oracle STAT(/, D), i.e., that one can 
simulate the oracle. Assuming this can be done, the simulation algorithm -Asq-SIM is very simple: Run the 
statistical query algorithm *4sq> an d whenever it makes a query to STAT(/, D), simulate it. To this end, in 
the following lemma we describe a procedure that simulates an SQ oracle. (Our approach here is similar to 
that of earlier simulation procedures that have been given in the literature, see e.g. Denis et al. IIDGL0511 .) 

Lemma 24. Let C be a concept class over {— 1, l} n , / 6 C, and D be a samplable distribution over 
{ — 1, l} n . There exists an algorithm Simulate-STATj? with the following properties: It is given access 

to independent samples from Df t+ , and takes as input a number bf G [0, 1], a t(n)-time computable query 
function \ : { — l,l} n x{ — 1,1} — > [—1,1], a tolerance r and a confidence 5. It has the following behavior: 
it uses m = O ((1/r 2 ) log(l/<5)) samples from D and runs in time O (m • t(n)) , and if \bf — 

Pr xr ^£)[f(x) = 1] | < t', then with probability 1 — 5 it outputs a number v such that 

\B x ^ d [x(xJ(x))]-v\<t + t'. (4) 
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Proof. To prove the lemma, we start by rewriting the expectation in (|4l) as follows: 

E*~£> lx(x,f(x))} = E x ^ Df+ [ X (x, 1)] • Pr,^ D [f(x) = 1] + K,.. „, [x(x, -1)] ■ Pr x ^ D [f(x) = -1}. 
We also observe that 

E*~D [x(z, -1)] = E*~d /i+ -1)] • Pr^ D [/(x) = 1] + E^ D/ . _ [ X (x, -1)] • Pr^ D [/(x) = -1]. 

Combining the above equalities we get 

^x~D [X(x, f{x))] = E^ D [ X (x, -1)] + E^ D/ . + [x(x, 1) - -1)] • Pr x ^ D [f(x) = 1]. (5) 

Given the above identity, the algorithm Simulate-STAT-j? is very simple: We use random sampling from 
D to empirically estimate the expectations E Xr ^D [x(x, —1)] (recall that D is assumed to be a samplable dis- 
tribution), and we use the independent samples from £>/,+ to empirically estimate E^d* + [x(x, 1) — x{ x i ~ 
Both estimates are obtained to within an additive accuracy of ±r/2 (with confidence probability 1 — 5/2 
each). We combine these estimates with our estimate bf for Pr x ^£>[/(x) = 1] in the obvious way (see 
Step 2 of pseudocode below). 



Subroutine Simulate-STAT^(Z), Df )+ ,x, t, £>/, 5): 

Input: Independent samples from D and query access to x '■ {— lj l} n {— 1) 1}. accuracy r, 

6/ € [0, 1] and confidence 5. 

Output: If \bf — Pr Xr ^£)[f(x) = 1]| < r', a number u that with probability 1 — d satisfies 

\B x ^ d [x(xJ(x))}-v\<t + t'. 

1. Empirically estimate the values E x ^£)[x(x, —1)] and + 1) — x{ x : ~ 1)] to within an 
additive ±r/2 with confidence probability 1 — 5/2. Let E\,E2 be the corresponding estimates. 

2. Output v = E\ + E 2 ■ bf. 



By Fact[l6l we can estimate each expectation using m = ((1/r 2 ) log(l/5)) samples (from D, 
respectively). For each such sample the estimation algorithm needs to evaluate the function x (once for 
the first expectation and twice for the second). Hence, the total number of queries to x i s 0(m), i.e., the 
subroutine Simulate-STAT^ runs in time 0(m ■ t(n)) as desired. 

By a union bound, with probability 1 — 5 both estimates will be ±r/2 accurate. The bound (@]) follows 
from this latter fact and (f5]) by a straightforward application of the triangle inequality. This completes the 
proof of Lemma l24l □ 

Given the above lemma, we can state and prove our general result for simulating SQ algorithms: 

Proposition 25. Let C be a concept class and D be a samplable distribution over { — 1, 1}". Suppose there 
exists an SQ-learning algorithm Asq for C under D with the following performance: Asq runs in time 
T\ = ti(n, 1/e, 1/5), each query provided to STAT(/, D) can be evaluated in time T 2 = t2(n), and the 
minimum value of the tolerance provided to STAT(/, D) in the course of its execution is r = r(n, 1/e). 
Then, there exists an algorithm -AsQ-SlM that is given access to 

(i) independent samples from Df + ; and 

(ii) a number bf £ [0,1], 
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and efficiently simulates the behavior of Asq- In particular, .AsQ-SlM has the following performance guar- 
antee: on input an accuracy e and a confidence 5, it uses m = O ((1/r 2 ) • log(T±/5) • T\j samples from D 
and runs in time 7sq_sim = O (171T2), and if\bf — Pr x ^£>[f(x) = 1]| < r/2 then with probability 

1 — 5 it outputs a hypothesis h : { — 1, 1}" — > {— 1, 1} such that Pr Irv fl[/i(x) 7^ f(x)] < e. 

Proof. The simulation procedure is very simple. We run the algorithm Asq by simulating its queries using 
algorithm Simulate-STAT-P. The algorithm is described in the following pseudocode: 



Algorithm A S qsim(D, £>/,+ ,£, b f , 5): 

Input: Independent samples from D and bf G [0, 1], e, 5 > 0. 

Output: If \bj — Pr x ^£)[f(x) = 1]| < r/2, a hypothesis h that with probability 1 — 5 satisfies 

Pr x „ D [h(x) + f(x)] < e. 

1. Let r = r(n, 1/e) be the minimum accuracy ever used in a query to STAT(/, D) during the 
execution of *4sq(£, 5/2). 

2. Run the algorithm „4sq(c, <5/2), by simulating each query to STAT(/, D) as follows: 
whenever Asq makes a query (x,t) to STAT(/, D), the simulation algorithm runs 

Simulate-STATf , (D ) D/ t+ ,x,r/2,T/2,5/(2Ti)). 

3. Output the hypothesis h obtained by the simulation. 



Note that we run the algorithm Asq with confidence probability 1 — 5/2. Moreover, each query to 
the STAT(/, D) oracle is simulated with confidence 1 — 5/{2T\). Since Asq runs for at most T\ time 
steps, it certainly performs at most T\ queries in total. Hence, by a union bound over these events, with 
probability 1 — 5/2 all answers to its queries will be accurate to within an additive ±r/2. By the guarantee 
of algorithm Asq and a union bound, with probability 1 — 5, the algorithm Asq-SIM will output a hypothesis 
h : {—1, 1}™ {—1, 1} such that Pr xr ^£)[h(x) ^ f(x)] < e. The sample complexity and running time 
follow from the bounds for Simulate-STAT^?. This completes the proof of Proposition [25] □ 

Proposition [25] tells us we can efficiently simulate a statistical query algorithm for a concept class C 
under a samplable distribution D if we have access to samples drawn from (and a very accurate 

estimate of Pr Xr ^£)[f(x) = 1]). In our setting, we have that D = U g ~i^ where g £ C' is the function that 

(C C) 

is output by A d ^ n ■ So, the two issues we must handle are (i) obtaining samples from D, and (ii) obtaining 
samples from D j + . 

For (i), we note that, even though we do not have access to samples drawn exactly from D, it suffices 
for our purposes to use a r'-sampler for D for a sufficiently small r'. To see this we use the following fact: 

Fact 26. Let D, D' be distributions over {—1, l} n with d^viD, D') < r'. Then for any bounded function 
4> : {-1,1}™ -> [-1,1] we have that \~E x „ D [(f)(x)] - E^ D /[0(a;)]| < 2r'. 
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Proof. By definition we have that 



E (D(x)-D'(x))j>(x) 



xe{-i,i} 



< 



E |(^)-^(^))|l^)l 



< 



!E6{-l,l} n 

max xe{ _ lil} n \<j)(x)\ ■ E l- ^) - ^'(^l 



a:6{-l,l} n 



< 



< 



1 • \\D-D'\\ 
2d TY (D,D') 
2t' 



as desired. 



□ 



The above fact implies that the statement of Propositionl25lcontinuous to hold with the same parameters 
if instead of a 0-sampler for D we have access to a r'-sampler for D, for t' = t/8. The only difference is that 
in Step 1 of the subroutine Simulate-STAT^? we empirically estimate the expectation E x ^D'[x{ x ^ ~ 1)] 
up to an additive ±r/4. By Fact |26l this will be a ±(r/4 + 2r') = dbr/2 accurate estimate for the 
E x ^d[x(x, —1)]. That is, we have: 

Corollary 27. 77je statement of Proposition\25\continues to hold with the same parameters if instead of a 
0-sampler for D we have access to a r' = r /8- sampler for D. 

For (ii), even though we do not have access to the distribution D = U g -i^ directly, we note below 
that we can efficiently sample from using samples from Uf-iru together with evaluations of g (recall 
again that g is provided as the output of the densifier). 

Claim 28. Let g : { — l,l} n — > {—1,1} be a t g (n) time computable function such that Pr xr ^n^_ 1 [g(x) = 1] > e'. 
There is an efficient subroutine that is given e' and a circuit to compute g as input, uses m = 0((l/e') log(l/<5)) 
samples from Uf-i(iy runs in time 0(m ■ t g (n)), and with probability 1 — 5 outputs a sample x such that 
x ~ Df t+ , where D = W 9 -im, 

Proof. To simulate a sample from Df + we simply draw samples from Uf-in\ until we obtain a sample x 
with g{x) = 1. The following pseudocode makes this precise: 



Subroutine Simulate-sample" D ^'+(Wj-i( 1 ), g, e' , 5): 

Input: Independent samples from Uf-iru, a circuit computing g, a value e' > such that e' < 
P r i~W^_ip) [d( x ) = 1] an( l confidence parameter 5. 

Output: A point x € {— 1, l} n that with probability 1 — 6 satisfies x ~ -D/,+. 
1. Repeat the following at most m = 0((l/e / ) log (1/5)) times: 



(a) Draw a sample x ~ Uf-i^y 

(b) If the circuit for g evaluates to 1 on input x then output x. 



2. If no point x with g(x) = 1 has been obtained, halt and output "failure." 
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Since Pr x ^u f -i w [9{ x ) = 1]> e'. after repeating this process to = f2 ((1/e') log(l/<5)) times, we will 
obtain a satisfying assignment to g with probability at least 1 — 5. It is clear that such a sample x is distributed 
according to £>/,+■ For each sample we need to evaluate g once, hence the running time follows. □ 

Getting a good estimate bf of Pr Xr ^£>[f(x) = 1]. The simulations presented above require an additively 
accurate estimate bf of Pr x ^£)[f(x) = 1]. We now show that in our context, such an estimate can be easily 
obtained if we have access to a good estimate p of p = Pr xe u n lf{ x ) = 1]> using the fact that we have an 
efficient approximate counting algorithm for C and that D = W ff -im where g 6 C . 

Claim 29. Let g : { — 1, l} n -¥ {—1, 1}, g € C be at g (n) time computable function, satisfying Pr x ^ x [/ ( 

1] > 7' and Pr x ^u f -i 1} = 1] > 1 — e '- ^count ^ e aw ( e > 5) -approximate counting algorithm for C 
running in time T count (n, 1/e, 1/5). There is a procedure Estimate-Bias with the following behavior: 
Estimate-Bias takes as input a value < p < 1, a parameter t' > 0, a confidence parameter 5', and 
a representation of g € C . Estimate-Bias runs in time 0(t g ■ T count (n, 2/r', 1/5')) and satisfies the 

following: if p d = Pr xr ^u n [f(x) = 1] < .P < (1 + e ')p> men with probability 1 — 5' Estimate-Bias 
outputs a value bf such that \bf — Pr x ^r)[f(x) = 1]| < r'. 

Proof. The procedure Estimate-Bias is very simple. It runs -Acount on inputs e* = r'/2, 5', using the 
representation for g G C'. Let p g be the value returned by the approximate counter; Estimate-Bias 
returns p/p g . 

The claimed running time bound is obvious. To see that the procedure is correct, first observe that by 
Definition [8j with probability 1 — 5' we have that 

- — — < p < - — — ■ 1 + e*). 

For the rest of the argument we assume that the above inequality indeed holds. Let A denote | g~ x (1) [, let B 
denote n and le t C denote \ so the true value Pr x ^ D [f(x) = 1] equals 

4 and the above inequality can be rephrased as 

A 



< p g ■ 2 n < A ■ (1 + e* 



1 + e* 

By our assumption on p we have that 

B + C <p-2 n <(l + e')(B + C); 
since Pr x ^ w a [g(x) = 1] > 1 - e' we have 

< e' (i.e., C < ■ B ); 



B + C ~ 

and since Pr xr ju g [/(^) — 1] > 7' we have 



A ~ 1 



Combining these inequalities we get 



1 B 1 B + C p B , ... ( e 7 \ B , 

■-r<r— •— ^<^<-r- (1 + 0(1 + £*) ! + _ =-• (l + e « 



1 + e* A~ l + e* A ~ p g ~ A /v '\ 1 - e' J A 
Hence 

-7 — — <*U + *--L-\<-*L<&, 

A Pg ~ A\ 1 + e* J - 1 + e* ~ ' 

where we have used < A Recalling that e* = r'/2, the lemma is proved. □ 
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3.3 An algorithm that succeeds given the (approximate) bias of /. In this section, we present an al- 
gorithm A'^ v (e, 5,p) which, in addition to samples from Uf-i^, takes as input parameters e,5,p. The 
algorithm succeeds in outputting a hypothesis distribution Df satisfying dry(Df,Uf-iru) < e if the input 
parameter p is a multiplicatively accurate approximation to Pr x ^u n [f(x) = 1]. The algorithm follows the 
three high-level steps previously outlined and uses the subroutines of the previous subsection to simulate the 
statistical query algorithm. Detailed pseudocode follows: 



Algorithm A^ v (U f -i^),e, 5,p): 

Input: Independent samples fromUf-i^, accuracy and confidence parameters e, 5, and a value l/2 n < 
V < !• 

Output: If Pr xr ^u n [f(x) = 1] < p < (1 + e) Pr z ^[/(i) = 1], with probability 1 — 5 outputs an 
e-sampler Cf forW^-im ■ 

1 . [Run the densifier to obtain g] 

def def (C C f ) 

Fix ex = e/6 and 7 = j(n, l/ei,3/5). Run the 7-densifier A de ' n (e\,5/3,p) using random 
samples from Uf-in). Let g € C be its output. 

2. [Run the SQ-learner, using the approximate uniform generator for g, to obtain hypothesis 

h] 

(a) Fix e 2 = ey/7, r 2 = r(n, l/e 2 ) and m = 9 ((l/r|) • log(Ti/<5) • Ti), where I\ = 

^(71,1/62,12/5). 

(b) Run the generator «4.g en (5, T2/8, <5/(12m)) m times and let Sd C {—1, 1}™ be the multiset 
of samples obtained. 

(c) Run Simulate-sample jD ^'+(^-i( 1 ), g,j, 5 /(12m)) m times and let So f + Q {— l,l} n 
be the multiset of samples obtained. 

(d) Run Estimate-Bias with parameters p, r' = r 2 /2, 5' = 5/12 , using the representation 
for g S C, and let 6/ be the value it returns. 

(e) Run AsQ-sm(S D ,S D f i+ ,e2,bf,6/12). Let h : {-1,1}" -> {-1, 1} be the output hy- 
pothesis. 

3. [Output the sampler which does rejection sampling according to h on draws from the ap- 
proximate uniform generator for g] 

Output the sampler Cf which works as follows: 
For i = 1 to t = 9 ((1/7) log(l/(5e)) do: 

(a) Set e 3 = e7/48000. 

(b) Run the generator A^ cn (g, €3, 5e/(12t)) and let be its output. 

(c) If h(x®) = 1, output xW. 

If no x^> with h(x^) = 1 has been obtained, output the default element _L. 

Let D denote the distribution over {— 1, l} n U {_L} for which Cf is a 0-sampler, and let D' denote 
the conditional distribution of D restricted to {— 1, l} n (i.e., excluding _L). 
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We note that by inspection of the code for Cf, we have that the distribution D' is identical to (D 9j£3 ) fc -i( 1 ), 
where D g ^ 3 is the distribution corresponding to the output of the approximate uniform generator when called 
on function g and error parameter e% (see Definition [9]) and (D g ^ A ) h -i^ is D ge3 conditioned on /i _1 (l). 

We have the following: 

Theorem 30. Let p = Pr x& u n [f(x) = 1]. Algorithm ^4-^ v (e, 5,p) has the following behavior: If p < p < 
(1 + e)p, then with probability 1 — 5 the following both hold: 

(i) theoutputCf is a sampler for a distribution D such that d^\(D ',Uf-i^) < e; and 

(ii) the functions h,g satisfy | /i 1 ( 1 ) n g -1 (l)|/|g -1 (l)| > 7/2. 

The running time ofA'^ v is polynomial in T^ en (n, l/e,l/S), T gen (n, l/e,l/S), T count (n, 1/e, 1/5), t\(n, 1/e, 
t%(n), l/r(n, 1/e), and l/y(n, 1/e, 1/5). 

Proof. We give an intuitive explanation of the pseudocode in tandem with a proof of correctness. We argue 
that Steps 1-3 of the algorithm implement the corresponding steps of our high-level description and that the 
algorithm succeeds with confidence probability 1 — 5. 

We assume throughout the argument that indeed p lies in [p, (1 + e)p). Given this, by Definition l20l with 
probability 1 — 5/3 the function g satisfies properties (a) and (b) of Definition [20l i.e., Pr x ^u [d( x ) — 
1] > 1 — ei and Pr x ^ _ 1 [f(x) = 1] > 7. We condition on this event (which we denote E\) going forth. 

We now argue that Step 2 simulates the SQ learning algorithm A$q to learn the function / G C under 
distribution D = U g -im to accuracy 62 with confidence 1 — 5/3. Note that the goal of Step (b) is to obtain m 
samples from a distribution D" (the distribution "D g >T2 /s" of Definition [9]) such that dT\r(D", D) < T2/8. 
To achieve this, we call the approximate uniform generator for g a total of m times with failure probability 
5 /(12m) for each call (i.e., each call returns _L with probability at most 5 /(12m)). By a union bound, 
with failure probability at most 5/12, all calls to the generator are successful and we obtain a set So of 
m independent samples from D". Similarly, the goal of Step (c) is to obtain m samples from and 
to achieve it we call the subroutine Simulate-sample 1 ^ a total of m times with failure probability 
5/ (12m) each. By Claim |28l and a union bound, with failure probability at most 5/12, this step is successful, 
i.e., it gives a set So f + of m independent samples from £>/,+■ The goal of Step (d) is to obtain a value bf 
satisfying \bf — Pr x ^rj[f(x) = 1]| < T2/2; by Claim |29l with failure probability at most 5/12 the value 
bf obtained in this step is as desired. Finally, Step (e) applies the simulation algorithm .Asq-SIM using the 
samples Sd and Sd, + and the estimate bf of Pv x ~D[f(x) = 1] obtained in the previous steps. Conditioning 
on Steps (b), (c) and (d) being successful Corollary [27] implies that Step (e) is successful with probability 
1 — 5/12, i.e., it outputs a hypothesis h that satisfies Pr x ^D[/(x) 7^ h(x)] < €2- A union bound over 
Steps (c), (d) and (e) completes the analysis of Step 2. For future reference, we let E2 denote the event 
that the hypothesis h constructed in Step 2(e) has Pr Xr ^jj[f(x) / h(x)] < €2 (so we have that E2 holds 
with probability at least 1 — 5/3; we additionally condition on this event going forth). We observe that 
since (as we have just shown) Pr xr ^u _ t [f(x) ^ h(x)] < €2 and ~Pv xr ^u [f( x ) = 1] > 7. we have 
P r a;~w g _i (1) [h(x) = 1] > 7 — €2 > 7/2, which gives item (ii) of the theorem; so it remains to establish 
item (i) and the claimed running time bound. 

To establish (i), we need to prove that the output distribution D of the sampler Cf is e-close in total 
variation distance to Uf-iny This sampler attempts to draws t samples from a distribution D' such that 
dry(D', D) < €3 (this is the distribution "_D 9j£3 " in the notation of Definition [9]) and it outputs one of these 
samples that satisfies h (unless none of these samples satisfies h, in which case it outputs a default element 
J_). The desired variation distance bound follows from the next lemma for our choice of parameters: 
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Lemma 31. Let D be the output distribution of ^^(Uf-im, e, 6,p). IfPr x ^u n [f(x) = 1] < p < (1 + 
e) Pr x ^u n [f(x) = 1], then conditioned on Events E\ and E<i, we have 

d TV (A^ 1( i)) < ! + l + ^ + + £ + 



6 6 7 2 7 7 — €2 

e e e e e e 

~ 6 + 6 + 12000 + 6 + 14 6 < 6 ' 

Proof. Consider the distribution D' = D g t3 (see Definition [9]) produced by the approximate uniform gen- 
erator in Step 3 of the algorithm. Let D'\ h -i^ denote distribution D' restricted to /i _1 (l). Let S denote the 
set 5 _1 (1) n /i~ 1 (l). The lemma is an immediate consequence of Claims [32l [34j [35] and [36]below using the 
triangle inequality (everything below is conditioned on E\ and E2). □ 

Claim 32. d TV (D,D') < e/6. 

Proof. Recall that D' is simply D conditioned on not outputting L. 

We first claim that with probability at least 1 — 5e/l2 all t points drawn in Step 3 of the code for Cf 
are distributed according to the distribution D' = D g ^ 3 over g _1 (l). Each of the t calls to the approximate 
uniform generator has failure probability <5e/(12i) (of outputting _L rather than a point distributed according 
to D') so by a union bound no calls fail with probability at least 1 — <5e/12, and thus with probability at least 
1 — 5e/12 indeed all t samples are independently drawn from such a distribution D'. 

Conditioned on this, we claim that a satisfying assignment for h is obtained within the t samples with 
probability at least 1 — 5e/l2. This can be shown as follows: 

Claim 33. Let h : {—1, 1}™ — > {—1, 1} be the hypothesis output by -4-sq-SIM- We have 

Pr x „ D ,[h(x) = 1] > 7 /4. 

Proof. First recall that, by property (b) in the definition of the densifier (Defmition[20t. we have Pr Xr ^D [f(x) 
1] > 7- Since di\{D' ', D) < 63, by definition we get 

Pr x ^ D ,[f(x) = 1] > Pr^ D [/(x) = 1] - £3 > 7 - £3 > 3 7 /4. 

Now by the guarantee of Step 2 we have that Pr Xr ^£){f(x) ^ h(x)] < €2- Combined with the fact that 
d^\{D' ', D) < €3, this implies that 

Pr x ^ D ,[f{x) + h(x)) < e 2 + e 3 < 7/2. 

Therefore, we conclude that 

Pr x ^ D ,[h(x) = 1] > Pr x ^ D ,[f(x) = 1] - Pr x ^ D ,[f(x) + h(x)\ > 3 7 /4 - 7 /2 > 7/4 
as desired. □ 

Hence, for an appropriate constant in the big-Theta specifying t, with probability at least 1 — Se/12 > 
1 — 5/12 some sW is a satisfying assignment of h. that with probability at least 1 — Je/12 some i' 1 ', i G [t], 
has h(x) = 1. Thus with overall failure probability at most <5e/6 a draw from D 1 is not _L, and consequently 
we have d TV (A &) < Se/6 < e/6. □ 

Claim 34. d TY (D', D'\ h -i {1) ) < e/6. 
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Proof. The probability that any of the t points x 



(i) 



(*) 



is not drawn from D' is at most t ■ <5e/(12t) < 



e/12. Assuming that this does not happen, the probability that no x^ lies in is at most (1 — 7/4)* < 

de/12 < e/12 by Claim [33] Assuming this does not happen, the output of a draw from D is distributed 
identically according to D'\ h -if]\. Consequently we have that cItv(D, D'\ h -i^) < e/6 as claimed. □ 

Claim 35. d TV {D'\ h -i {x) ,Us) < 4e 3 /7- 

Proof. The definition of an approximate uniform generator gives us that dTv(D' ,U g -i^) < €3, and 
Claim|33]gives that Pr xr ^£)'[h(x) = 1] > 7/4. We now recall the fact that for any two distributions Di, D 2 
and any event E, writing D,i\e to denote distribution Di conditioned on event E, we have 



dTv{Di\ E ,D 2 \E) < 



d T y(D 1 ,D 2 ) 



Di (E) 



The claim follows since U g -im \ h -i 



n) is equivalent to Us- 



□ 



Claim 36. (hv(U s ,U f - Hl) ) < ei + § + 



£2 
7— £ 2 - 



Proof. The proof requires a careful combination of the properties of the function g constructed by the 
densifier and the guarantee of the SQ algorithm. Recall that S = n We consider the set 

S' = fl By the triangle inequality, we can bound the desired variation distance as follows: 



dTY(Us,Uf-i^) < dTy(Uj:-i^,Us') + d<ry(Us' Ms)- 



(6) 



We will bound from above each term of the RHS in turn. To proceed we need an expression for the total 
variation distance between the uniform distribution on two finite sets. The following fact is obtained by 
straightforward calculation: 

Fact 37. Let A, B be subsets of a finite set W andU^, Ub be the uniform distributions on A, B respectively. 
Then, 



d TY (U A ,U B ) = (1/2) • + (1/2) • 



+ (1/2) • \AC\B\ 



1 

L4J 



1 

LB] 



(7) 



To bound the first term of the RHS of © we apply the above fact for A = f x (l) and B = S'. Note that in 
this case 6CA, hence the second term of (0 is zero. Regarding the first term, note that 



\ahb\ ir^n-rHi)! 



\A\ 



<ei, 



where the inequality follows from Property (a) of the densifier definition. Similarly, for the third term we 
can write 



\A n B\ 



1 1 




1 1 




= \ B \ ■ 




W\~W\ 


W\~W\ 



1 



\B\ 
\A\ 



1 



< ei 



where the inequality also follows from Property (a) of the densifier definition. We therefore conclude that 

^Tv(W/-!(l))Ws') < e l- 

We now proceed to bound the second term of the RHS of © by applying Fact [37] for A = S' and 
B = S. It turns out that bounding the individual terms of ([7]) is trickier in this case. For the first term we 
have: 



\A n B\ \f-Hi) n g~\i) n h~\i)\ \r\i) n ^-^l) n 



Vr l (i)\ 



|/-Hi)n 5 -Hi)| 



|/-i(i)n^(i) 



< 



£2 

7 ' 
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where the last inequality follows from the guarantee of the SQ learning algorithm and Property (b) of the 
densifier definition. For the second term we have 



\bha\ [/-Hijn^CiJnfr-Hi)! 



\b\ n ^(1)1 

To analyze this term we recall that by the guarantee of the SQ algorithm it follows that the numerator satisfies 



|/-i(i)nrHi)nfc-Hi)[<e 3 -b _1 (i)l- 

From the same guarantee we also get 



\f-\i) ng-Hi) n h-i(i)\ <e 2 -| 5 - 1 (i)|. 

Now, Property (b) of the densifier definition gives \f~ l (l) n <7 1 ( 1) | > 7 • |<7~ 1 (1)|. Combing these two 
inequalities implies that 

\g-\i) n h-\i)\ > \f-\l) n g -\i) n ^(1)1 > (7 - e 2 ) • I^WI- 

In conclusion, the second term is upper bounded by (1/2) • . 
For the third term, we can write 



\AnB\ 



1 1 

W\~W\ 



\f-\i)n g -\i)nh-\i)\- 



|/-i(i)n 5 -i(i)| \g-i(i)nh-Hi)\ 

To analyze these term we relate the cardinalities of these sets. In particular, we can write 

\f-\l) n g -\i)\ = \rHi)n g -\i)nh-\i)\ + \r\i)ng-\i)nh^(iT\ 

< ir'WnrtiJnfc-^l + ca-lrti)! 

< ir^l) n g-\l) n h-\l)\ + ■ |r Hi) n g -\i)\ 

7 

where the last inequlity is Property (b) of the densifier defmtion. Therefore, we obtain 

(l - ^) ■ irHyng-Hi)] < \r\i) n 9 -\i) nh~\i)\ < (/^(l) ng-\i)\. 

7 



Similarly, we have 

\ g -\i) nh-\i)\ 



= \r\i)n g -\i)nh-\i)\ + |/-Hi) n 9 -\i) nh-\i)\ 



< |r 1 (i)n^ 1 (i)n/ l - 1 (i)| + 



(2 



7-^2 



■\g-\\)nhr\l)\ 



and therefore 



(1 



£2 



7-^2' 



-) • ^(l) n hr\i)\ < \r\i) n g -\i) n < \g~\i) n hr\\)\. 



The above imply that the third term is bounded by (1/2) 



£2 
7-^2 



. This completes the proof of the claim. □ 



With Lemma [3]] established, to finish the proof of Theorem [30] it remains only to establish the claimed 
running time bound. This follows from a straightforward (but somewhat tedious) verification, using the 
running time bounds established in Lemma l24l Proposition l25l Corollary [27] Claim [28] and Claim [29] □ 
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3.4 Getting from Af nv to Af nv : An approximate evaluation oracle. Recall that the algorithm A^ v from 
the previous subsection is only guaranteed (with high probability) to output a sampler for a hypothesis 
distribution D that is statistically close to the target distribution Uf-im if it is given an input parameter p 

satisfying p <p < (1 + e)p, where p = Pr x6 ^ n [f(x) = 1]. Given this, a natural idea is to run A'£ v a total 
of k = 0(n/e) times, using "guesses" for p that increase multiplicatively as powers of 1 + e, starting at 1/2" 
(the smallest possible value) and going up to 1. This yields hypothesis distributions Di, . . . , Dk where Di 

is the distribution obtained by setting p to pi = (1 + e) l ~ l /2 n . With such distributions in hand, an obvious 
approach is to use the "hypothesis testing" machinery of Section[2]to identify a high-accuracy Z); from this 
collection. 

This is indeed the path we follow, but some care is needed to make the approach go through. Recall that 
as described in Proposition [T2l the hypothesis testing algorithm requires the following: 

1. independent samples from the target distribution Uf-in\ (this is not a problem since such samples are 
available in our framework); 

2. independent samples from Di for each i (also not a problem since the i-th run of algorithm A'^ v 
outputs a sampler for distribution Di ; and 

3. a (1 + 0(e))-approximate evaluation oracle EVAL^, for each distribution Di. 

In this subsection we show how to construct item (3) above, the approximate evaluation oracle. In more 
detail, we first describe a randomized procedure Check which is applied to the output of each execution 
of A'£ lv (across all k different settings of the input parameter p,). We show that with high probability the 
"right" value p^ (the one which satisfies p < p^ < (l + e)p) will pass the procedure Check. Then we show 
that for each value pi* that passed the check a simple deterministic algorithm gives the desired approximate 
evaluation oracle for Di. 

We proceed to describe the Check procedure and characterize its performance. 



Algorithm Check(g, h, 5', e) : 

Input: functions g and h as described in Lemma[3U a confidence parameter 5', and an accuracy param- 
eter e 

Output: If n^T 1 (1)| /Iff -1 (1)| > 7/2, with probability 1-5' outputs a pair (a, k) such that 

\a- |/ i - 1 (l)n^- 1 (l)|/|r?- 1 (l)|| < H^ 1 (l)n<r 1 (l)l/I<T 1 (1)I and < « < (l + r)^^, 

where // = r = e/40000. 

1. Sample m = 0(log(2/5')/('jn 2 )) points x l , . . . ,x m from Ag Cn (g, 7/4, 5' /(2m)). If any x j = _L 
halt and output "failure." 

2. Let a be (1/m) times the number of points x J that have h(x) = 1. 

3. Call -Amount ( r i ^'/2) on 9 an d set k to 2 n times the value it returns. 



Lemma 38. Fix i G [k]. Consider a sequence of k runs of A'^ n where in the i-th run it is given pi = 
(1 + e)* -1 /2 ra as its input parameter. Let gi be the function in C constructed by A'^ v in Step 1 of its i-th 
run and hi be the hypothesis function constructed by A'^ v in Step 2(e) of its i-th run. Suppose Check is 
given as input gi, hi, a confidence parameter 5', and an accuracy parameter e'. Then it either outputs "no " 
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or a pair (aj,Kj) € [0,1] x [0, 2 n+1 ], and satisfies the following performance guarantee: If \h i H 
5i" 1 (l)l/l5 , i~ 1 (l)l — 7/2 men with probability at least 1 — 5' Check outputs a pair (aj, Ki) such that 



a; 



1^(1)0^(1)1 



and 



where \i = r = e/40000. 



l + T 



|yi ( i)ngri(i)| 



<^<(l + r)| 5 ri(l)|, 



(8) 



(9) 



Proo/ Suppose that £ is such that | 1 ( 1 ) (7 ^r^ 1 (1) | /l^r" 1 (1)| > 7/2. Recall from Definition H that each 
point x J drawn from „4g 0n (<7j, 7/4, 8' / (2m)) in Step 1 is with probability 1 — 5' / (2m) distributed according 
to Dg.^i^; by a union bound we have that with probability at least 1 — 5' /2 all m points are distributed 
this way (and thus none of them are _L). We condition on this going forward. Definition [9] implies that 

d T v(D g . )1 / 4: ,U gT i( l y) < 7/4; together with the assumption that \h^ l (l) 17 g^ 1 (l)]/^ 1 (1)\ > 7/2, this 

implies that each x-' independently has proability at least 7/4 of having h(x) = 1. Consequently, by the 
choice of m in Step 1 , a standard multiplicative Chernoff bound implies that 



1^(1)1 



with failure probability at most <5'/4, giving dU). 

Finally, Definition [8] gives that (O holds with failure probability at most <5'/4. This concludes the proof. 

□ 



Next we show how a high-accuracy estimate a\ of \h i 1 (1) 17 g i 1 (1) 
approximate evaluation oracle for D[. 



_1 (1)| yields a deterministic 



Lemma 39. Algorithm Simulate-Approx-Eval (which is deterministic) takes as input a value a € 
[0, 1], a string x € { — 1, l} n , a parameter k, (a circuit for) h : { — 1, 1}™ —> { — 1, 1}, and (a representation 
for) g : { — 1,1}™ — > { — 1,1}, g G C, where h, g are obtained from a run ofA'^ v . Suppose that 



a 



\h^(l)ng^(l) 



and 



\g-Hi)\ 

l + T 



[h-Wng-^iy 



<K<(l+r)\g^(l)\ 



where fx = r = e/40000. Then Simulate-Approx-Eval outputs a value p such that 

b'(x) 



1 + fS 



<p<(l + (3)D'(x), 



(10) 



where f3 = e/192, D is the output distribution constructed in Step 3 of the run of Af nv that produced h,g, 
and D' is D conditioned on {— 1, l} n (excluding _L). 

Proof. The Simulate-Approx-Eval procedure is very simple. Given an input x S {—1, l} n it evalu- 
ates both g and h on x, and if either evaluates to — 1 it returns the value 0. If both evaluate to 1 then it returns 
the value l/(/ca). 
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For the correctness proof, note first that it is easy to see from the definition of the sampler Cf (Step 3 
of A'^ lv ) and Definition |9] (recall that the approximate uniform generator .4g en (<j) only outputs strings that 
satisfy g) that if x € {—1,1}™, x ^ h~ l (l) n 5 _1 (1) then D has zero probability of outputting x, so 
Simulate-Approx-Eval behaves appropriately in this case. 

Now suppose that h(x) = g(x) = 1. We first show that the value 1/{kq) is multiplicatively close 
to l/|/i _1 (l) n Let us write A to denote and B to denote n With this 

notation we have 



B 

a ~A 



B A , 
< \i ■ — and < k < (1 + t)A. 

A 1 + T 



Consequently, we have 



B(l-fi-r)<B- = - (1 - n) ■ <Ka<^(l + fi)-(l + r)A< B(l + 2/i + 2r), 

1 + r A 1 + T >1 

and hence 

1 1 < 1 < 1 1 

s'l + 2/x + 2T~«a~B'l-/i-r' 

Now consider any x G /i~ 1 (l) H g~ l (l). By Definition [9] we have that 

i^'FW- %3(l) - (1 + e3) 'Fi' 

Since a draw from D' is obtained by taking a draw from D 9j£3 and conditioning on it lying in /i _1 (l), it 
follows that we have 

1 1 A// x / x 1 

<£>'(£) < (l + e 3 )- 



l + e 3 B " v ' ~ v u/ 5 

Combining this with (fTTb and recalling that /x = r = e/40000 and e 3 = £7/48000, we get (fTOl) as desired. 

□ 

3.5 The final algorithm: Proof of Theorem |2lJ Finally we are ready to give the inverse approximate 
uniform generation algorithm Af nv for C. 



Algorithm Af{ nv (U f -i m , e, 5) 

Input: Independent samples from Uf-iru, accuracy and confidence parameters e, 5. 
Output: With probability 1 — 5 outputs an e-sampler Cf for Uf-in) . 



1. For i = 1 to k = 0(n/e): 

(a) Set pi = (1 + e)''-72™ 

(b) Run A'^ v (Uf-i^,e/12, 5/3, pi). Let G C be the function constructed in Step 1, hi be 
the hypothesis function constructed in Step 2(e), and (Ct)i be the sampler for distribution 
L>i constructed in Step 3. 

(c) Run Cheeky, hi, 5/3, e). If it returns a pair (a^ftj) then add i to the set S (initially 
empty). 
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2. Run the hypothesis testing procedure T^ f over the set {D'^i^s of hypothesis distributions, 
using accuracy parameter e/12 and confidence parameter (5/3. Here T Uf ~ 1< - 1) is given access to 
Uf-iru, uses the samplers (Cf)i to generate draws from distributions D' i (see Remark [T8l), and 
uses the procedure Simulate-Approx-Eval(aj, Ki, hi,gi) for the (1 + e/192)-approximate 
evaluation oracle EVAL a, for D[. Let i* G S be the index of the distribution that it returns. 

i 

3. Output the sampler (C/)j*. 



Proof of Theorem l2lt Letp = Pr xe y n {f(x) = 1] denote the true fraction of satisfying assignments for / 
in {— 1, l} n . Let i* be the element of [k] such that p <p%* < (1 + e/6)p. By Theorem |30l with probability 
at least 1 — 6/3 we have that both 

(i) (Cf)i* is a sampler for a distribution Di* such that dTy(Di* ,Uf-im) < e/6; and 

(h) |^(i) n 5 ^(i)|/| 5 -i(i)|> 7 / 2 . 

We condition on these two events holding. By Lemma[38j with probability at least 1 — 6/3 the procedure 
Check outputs a value a^* such that 



1^(1) n^(i) 

Oti* — 



1^(1) 



1^(1) ng^(l) 

1^(1)1 



for ^ = e/40000. We condition on this event holding. Now Lemma|39limplies that Simulate-Approx-Eval((C^)j* 
meets the requirements of a (1 + /3) -approximate evaluation oracle for EVAL^,, from Proposition [121 for 

i* 

/3 = jig - Hence by Proposition [T2l (or more precisely by Remark [T8T) with probability at least 1 — 6/3 the 
index i* that returns is such that D'^ is an e/2-sampler for Uf-i^ as desired. 

As in the proof of Theorem [30l the claimed running time bound is a straightforward consequence of the 
various running time bounds established for all the procedures called by A? . This concludes the proof of 
our general positive result, Theorem [21] □ 



4 Linear Threshold Functions 

In this section we apply our general framework from Section[3]to prove Theorem|2j i.e., obtain a polynomial 
time algorithm for the problem of inverse approximate uniform generation for the class C = LTF n of 
n-variable linear threshold functions over {—1, l} n . More formally, we prove: 

Theorem 40. There is an algorithm *4^ F which is a poly (n, 1/e, log(l/ 6)) -time inverse approximate 
uniform generation algorithm for the class LTF n . 

The above theorem will follow as an application of Theorem |2T] for C = C = LTF n . The literature 
provides us with three of the four ingredients that our general approach requires for LTFs - approximate uni- 
form generation, approximate counting, and Statistical Query learning - and our main technical contribution 
is giving the fourth necessary ingredient, a densifier. We start by recalling the three known ingredients in 
the following subsection. 
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4.1 Tools from the literature. We first record two efficient algorithms for approximate uniform genera- 
tion and approximate counting for LTF n , due to Dyer [Dye03 |: 



Theorem 41. (approximate uniform generation for LTF n , ^Dye03^ ) There is an algorithm *4g^ F that on 
input (a weights-based representation of) an arbitrary h G LTF n and a confidence parameter 5 > 0, runs 
in time poly(n, log(l/(5)) and with probability 1 — 5 outputs a point x such that x ~ U^-iny 

We note that the above algorithm gives us a somewhat stronger guarantee than that in Definition [9] Indeed, 
the algorithm -4gc^ F with high probability outputs a point x G {— 1, l} n whose distribution is exactly U^-in) 
(as opposed to a point whose distribution is close to U h ~i(if). 

Theorem 42. (approximate counting for LTF n , ^Dye03^ ) There is an algorithm -Amount that on input (a 
weights-based representation of) an arbitrary h G LTF n , an accuracy parameter e > and a confidence 
parameter 5 > 0, runs in time poly(n, 1/e, log(l/5)) and outputs p G [0, 1] that with probability 1 — 5 
satisfies p G [1 — e, 1 + e] ■ Pr x ^u n [h(x) = 1]. 

We also need an efficient SQ learning algorithm for halfpaces. This is provided to us by a result of Blum 
et. al. HBFKV971 : 

Theorem 43. (SQ learning algorithm for LTF n , HBFKV97\l ) There is a distribution-independent SQ 
learning algorithm A^q F for LTF n that has running time t\ = poly(n, 1/e, log(l/<5)), uses at most 
t2 = poly(n) time to evaluate each query, and requires tolerance of its queries no smaller than r = 
l/poly(n, 1/e). 

4.2 A densifier for LTF n . The last ingredient we need in order to apply our Theorem[2T]is a computation- 
ally efficient densifer for LTF n . This is the main technical contribution of this section and is summarized 
in the following theorem: 

Theorem 44. (efficient proper densifier for LTF n J Set 7(e, 5, n) = (5/ (n 2 log n)). There is an (e, 7, 5)— 
densifier for LTF n that, for any input parameters < e, 5, l/2 n < p < 1, outputs a function 

g G LTF n and runs in time poly(n, 1/e, log(l/<5)). 

Discussion and intuition. Before we prove Theorem |44j we provide some intuition. Let / G LTF n be the 
unknown LTF and suppose that we would like to design an (e, 7, <5)-densifier A%EF for /. That is, given 
sample access to Uf-i^, and a number p satisfying p < p < (1 + e)p, where p = Pr x <=u n {f(x) = 1], 
we would like to efficiently compute (a weights-based representation for) an LTF g : {— 1, l} n — > {—1,1} 
such that the following conditions are satisfied: 

(a) Pr x ^ Uf _ lw [g(x) = 1] > 1 - e, and 

(b) P Wn [g(x) = 1] < (I/7) • Pr^J/ = 1]. 

(While condition (b) above appears slightly different than property (b) in our Definition |20l because of 
property (a), the two statements are essentially equivalent up to a factor of 1/(1 — e) in the value of 7.) 

We start by noting that it is easy to handle the case that p is large. In particular, observe that if p > 27 
then p = Pr x ^ n [/(x) = 1] > p/(l + e) > p/2 > 7, and we can just output g = 1 since it clearly 
satisfies both properties of the definition. For the following intuitive discussion we will henceforth assume 
that p < 27. 

Recall that our desired function g is an LTF, i.e., g(x) = sign(t> • x — t), for some (v, t) G R n+1 . Recall 
also that our densifier has sample access to Uf-i^, so it can obtain random positive examples of /, each 
of which gives a linear constraint over the v, t variables. Hence a natural first approach is to attempt to 
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construct an appropriate linear program over these variables whose feasible solutions satisfy conditions (a) 
and (b) above. We begin by analyzing this approach; while it turns out to not quite work, it will gives us 
valuable intuition for our actual algorithm, which is presented further below. 

Note that following this approach, condition (a) is relatively easy to satisfy. Indeed, consider any e > 
and suppose we want to construct an LTF g = sign(u ■ x — t) such that Pr x ^ [g(x) = 1] > 1 — e. 

This can be done as follows: draw a set S + of N + = © ((1/e) • (n 2 + log(l/<5))) samples from Uf-im 
and consider a linear program CV+ with variables (w, 9) G M n+1 that enforces all these examples to be 
positive. That is, for each x G S + , we will have an inequality w-x > 9. It is clear that CV+ is feasible (any 
weights-based representation for / is a feasible solution) and that it can be solved in poly(n, 1/e, log(l/<5)) 
time, since it is defined by 0(N + ) many linear constraints and the coefficients of the constraint matrix are in 
{±1}. The following simple claim shows that with high probability any feasible solution of CV+ satisfies 
condition (a): 

Claim 45. With probability at least 1 — 5 over the sample S+, any g G LTF n consistent with S+ satisfies 
condition (a). 

Proof. Consider an LTF g and suppose that it does not satisfy condition (a), i.e., Pr xr ^u„ [g{ x ) = — Mf( x ) 
I > e. Since each sample x G <S+ is uniformly distributed in the probability it does not "hit" the 

set g~ l {— 1) n f~ l (l) is at most 1 — e. The probability that no sample in S + hits g~ l {— 1) fl is thus 

at most (1 - e) N + < 5/2 n \ Recalling that there exist at most 2™ 2 distinct LTFs over {-1, l} n MMur71i 
it follows by a union bound that the probability there exists an LTF that does not satisfy condition (a) is at 
most 5 as desired. □ 

The above claim directly implies that with high probability any feasible solution (w* ,9*) to CV+ is 
such that g*(x) = sign(u;* • x — 9*) satisfies condition (a). Of course, an arbitrary feasible solution to 
CV+ is by no means guaranteed to satisfy condition (b). (Note for example that the constant 1 function is 
certainly feasible for CV+.) Hence, a natural idea is to include additional constraints in our linear program 
so that condition (b) is also satisfied. 

Along these lines, consider the following procedure: Draw a set S- of N- = [5/p\ uniform unlabeled 
samples from {—1, l} n and label them negative. That is, for each sample x G S-, we add the constraint 
w ■ x < 9 to our linear program. Let CV be the linear program that contains all the constraints defined 
by 5 + U5_. It is not hard to prove that with probability at least 1 — 25 over the sample 5_ , we have that 
S- C 1) and hence (any weight based representation of) / is a feasible solution to CV. In fact, it is 

possible to show that if is sufficiently small — roughly, 7 < 5/ (4(n 2 + log(l/<5))) is what is required — 
then with high probability each solution to CV also satisfies condition (b). The catch, of course, is that the 
above procedure is not computationally efficient because iV_ may be very large - if p is very small, then it 
is infeasible even to write down the linear program CV. 

Algorithm Description. The above discussion motivates our actual densifier algorithm as follows: The 
problem with the above described naive approach is that it generates (the potentially very large set) all 
at once at the beginning of the algorithm. Note that having a large set S- is not necessarily in and of itself 
a problem, since one could potentially use the ellipsoid method to solve CV if one could obtain an efficient 
separation oracle. Thus intuitively, if one had an online algorithm which would generate 5_ on the fly, then 
one could potentially get a feasible solution to CV in polynomial time. This serves as the intuition behind 
our actual algorithm. 

More concretely, our densifier -A^cr^ will invoke a computationally efficient online learning algorithm 
for LTFs. In particular, -4^ F will run the online learner -4mx F for a sequence of stages and in each stage 
it will provide as counterexamples to ^4^ F judiciously chosen labeled examples, which will be positive 
for the online learner's current hypothesis, but negative for / (with high probability). Since -4mt F ma k es a 
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small number of mistakes in the worst-case, this process is guaranteed to terminate after a small number of 
stages (since in each stage we force the online learner to make a mistake). 

We now provide the details. We start by recalling the notion of online learning for a class C of boolean 
functions. In the online model, learning proceeds in a sequence of stages. In each stage the learning 
algorithm is given an unlabeled example x G {—1, l} n and is asked to predict the value f(x), where / G C 
is the unknown target concept. After the learning algorithm makes its prediction, it is given the correct value 
of f(x). The goal of the learner is to identify / while minimizing the total number of mistakes. We say that 
an online algorithm learns class C with mistake bound M if it makes at most M mistakes on any sequence 
of examples consistent with some / G C. Our densifier makes essential use of a computationally efficient 
online learning algorithm for the class of linear threshold functions by Maass and Turan HMT941 : 

Theorem 46. ([MT94], Theorem 3.3) There exists a poly(n) time deterministic online learning algorithm 

for the class LTF n with mistake bound M(n) d = 0(n 2 logn). In particular, at every stage of its 
execution, the current hypothesis maintained by A^f^ is a (weights-based representation of an) LTF that 
is consistent with all labeled examples received so far. 

We note that the above algorithm works by reducing the problem of online learning for LTFs to a convex 
optimization problem. Hence, one can use any efficient convex optimization algorithm to do online learning 
for LTFs, e.g. the ellipsoid method MKha80llGLS88i The mistake bound in the above theorem follows by 
plugging in the algorithm of Vaidya [Va i89l|Vai9611 . 

We now proceed with a more detailed description of our densifier followed by pseudocode and a proof of 
correctness. As previously mentioned, the basic idea is to execute the online learner to learn / while cleverly 
providing counterexamples to it in each stage of its execution. Our algorithm starts by sampling N+ samples 
from Uf and making sure that these are classified correctly by the online learner. This step guarantees 
that our final solution will satisfy condition (a) of the densifier. Let h G LTF n be the current hypothesis at 
the end of this process. If h satisfies condition (b) (we can efficiently decide this by using our approximate 
counter for LTF n ), we output h and terminate the algorithm. Otherwise, we use our approximate uniform 
generator to construct a uniform satisfying assignment x G Z^-im and we label it negative, i.e., we give the 
labeled example (x, —1) as a counterexample to the online learner. Since h does not satisfy condition (b), 
i.e., it has "many" satisfying assignments, it follows that with high probability (roughly, at least 1 — 7) over 
the choice of x G U h --i^, the point x output by the generator will indeed be negative for /. We continue 
this process for a number of stages. If all counterexamples thus generated are indeed consistent with / (this 
happens with probability roughly 1 — 7 • M, where M = M(n) = 6(n 2 log n) is an upper bound on the 
number of stages), after at most M stages we have either found a hypothesis h satisfying condition (b) or 
the online learner terminates. In the latter case, the current hypothesis of the online learner is identical to /, 
as follows from Theorem [46] (Note that the above argument puts an upper bound of 0(S/M) on the value 
of 7.) Detailed pseudocode follows: 



Algorithm A^ (u f -i (1) ,e,S,p): 

Input: Independent samples from Wj-im, parameters e, 5 > 0, and a value l/2 n < p < 1. 

Output: If p < p < (1 + e)p, with probability 1 — 5 outputs a function g G LTF n satisfying conditions 

(a) and (b). 

1. Draw a set S + of N + = 9 ((1/e) • (n 2 + log(l/5))) examples from Uf-iny 

2. Initialize i = and set M = f G(n 2 log n). 
While (i < M) do the following: 
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(a) Execute the i-th stage of Ayf^ and let hS 1 ' € LTF„ be its current hypothesis. 

(b) If there exists x G S+ with h^\x) = — 1 do the following: 

• Give the labeled example (x, 1) as a counterexample to ^4mt F - 

• Set i = % + 1 and go to Step 2. 

(c) Run ^IntC 1 '' 1 j e > <V(4M)) and let ^ be its output. 

(d) Set 7 = 5/(16M). If p- < p/(y • (1 + e) 2 ) then output 

(e) otherwise, do the following: 

• Run i™(/iW,<5/(4M)) and let a?W be its output. 

• Give the point —1) as a counterexample to Ayf^ . 

• Set i = i + 1 and go to Step 2. 

3. Output the current hypothesis of v4|^ F . 



Theorem 47. Algorithm A^ n (W^-im, e, 5,p) rans in f/me poly (n, 1/e, log(l/5)). /jfp < p < (1 + e)p 
f/ze« wzf/j probability 1 — 5 it outputs a vector (w, 9) such that g(x) = sign(u; • x — 6) satisfies conditions 
(a) and (b) at the start o f Section \4.2\ 

Proof. First note that by Claim|45] with probability at least 1 — 5/4 over 5+ any LTF consistent with S+ will 
satisfy condition (a). We will condition on this event and also on the event that each call to the approximate 
counting algorithm and to the approximate uniform generator is successful. Since Step 2 involves at most 
M iterations, by a union bound, with probability at least 1 — (5/4 all calls to -4j^nt wm ^ e successful, i.e., 
for all i we will have that Pi/(1 + e) < p% < (1 + e) • pi, where pi = ~Pr x£ u n [h^ (x) = 1]. Similarly, 
with failure probability at most 5/4, all points arW constructed by ^4g^ F will be uniformly random over 
Hence, with failure probability at most 35/4 all three conditions will be satisfied. 
Conditioning on the above events, if the algorithm outputs a hypothesis in Step 2(d), this hypothesis 
will certainly satisfy condition (b), since pi < (1 + e)pi < p/ (7 • (1 + e)) < p/7. In this case, the algorithm 
succeeds with probability at least 1 — 35/4. It remains to show that if the algorithm returns a hypothesis 
in Step 3, it will be successful with probability at least 1 — 5. To see this, observe that if no execution of 
Step 2(e) generates a point x® with f(x^) = 1, all the counterexamples given to A^f^ are consistent with 
/. Therefore, by Theorem |46l the hypothesis of Step 3 will be identical to /, which trivially satisfies both 
conditions. 

We claim that with overall probability at least 1 — 5/4 all executions of Step 2(e) generate points 
with /(xW) = —1. Indeed, fix an execution of Step 2(e). Since p, L > pj ((1 + e) 2 -7), it follows that 
P < (47)pi. Hence, with probability at least 1 — 47 a uniform point ~ W^-im is a negative example 
for/, i.e., zW G By a union bound over all stages, our claim holds except with failure probability 

47 • M = 5/4, as desired. This completes the proof of correctness. 

It remains to analyze the running time. Note that Step 2 is repeated at most M = 0(n 2 \ogn) times. 
Each iteration involves (i) one round of the online learner A^f^ (this takes poly(n) time by Theorem l46l). 
(ii) one call of -A^ou^ (this takes poly(n, 1/e, log(l/5)) time by Theorem l42l). and (hi) one call to *4.g^ F 
(this takes poly(n, 1/e, log(l/5)) time by Theorem |4~TI). This completes the proof of Theorem l47l □ 
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5 DNFs 



In this section we apply our general positive result, Theorem |2T] to give a quasipolynomial-time algorithm 
for the inverse approximate uniform generation problem for s-term DNF formulas. Let DNF n s denote 
the class of all s-term DNF formulas over n Boolean variables (which for convenience we think of as 0/1 
variables). Our main result of this section is the following: 

DNF 

Theorem 48. There is an algorithm A inv n ' s which is an inverse approximate uniform generation algo- 
rithm for the class DNF WiS . Given input parameters e, 5 the algorithm runs in time poly (n log ( s / e > , log(l/ <5)) 

We note that even in the standard uniform distribution learning model the fastest known running time for 
learning s-term DNF formulas to accuracy e is poly(n loB ("/ £ \log(l/5)) HVer90IIVall2l . Thus it seems likely 
that obtaining a poly(ra, s, l/e)-time algorithm would require a significant breakthrough in computational 
learning theory. 

For our application of Theorem I2T1 for DNFs we shall have C = DNF„ >s and C = DNF n 4 for some 
t which we shall specify later. As in the case of LTFs, the literature provides us with three of the four 
ingredients that our general approach requires for DNF — approximate uniform generation, approximate 
counting, and Statistical Query learning (more on this below) — and our main technical contribution is 
giving the fourth necessary ingredient, a densifier. Before presenting and analyzing our densifier algorithm 
we recall the other three ingredients. 

5.1 Tools from the literature. Karp, Luby and Madras MKLM891 have given approximate uniform gen- 
eration and approximate counting algorithms for DNF formulas. (We note that IUVV86I give an efficient 
algorithm that with high probability outputs an exactly uniform satisfying assignment for DNFs.) 

Theorem 49. (Approximate uniform generation for DNFs, [KLM89]) There is an approximate uniform 

DNF 

generation algorithm Ag C n far the class DNF n ^ that runs in time poly(ra, t, 1/e, log(l/<5)). 

Theorem 50. (Approximate counting for DNFs, [KLM89 ]) There is an approximate counting algorithm 
-4.gcn IF ™' t for the class DNF n> t that runs in time poly(n, t, 1/e, log(l/5)). 

The fastest known algorithm in the literature for SQ learning s-term DNF formulas under arbitrary 
distributions runs in time n °( nl/3 lo s s ) . poly(l/e) BKS0411 . which is much more than our desired running 
time bound. However, we will see that we are able to use known malicious noise tolerant SQ learning 
algorithms for learning sparse disjunctions over iV Boolean variables rather than DNF formulas. In more 
detail, our densifier will provide us with a set of N = n°^ og ^ s ^ e ^ many conjunctions which is such that 
the target function / is very close to a disjunction (which we call /') over an unknown subset of at most 
s of these N conjunctions. Thus intuitively any learning algorithm for disjunctions, run over the "feature 
space" of conjunctions provided by the densifier, would succeed if the target function were /', but the target 
function is actually / (which is not necessarily exactly a disjunction over these N variables). Fortunately, 
known results on the malicious noise tolerance of specific SQ learning algorithms imply that it is in fact 
possible to use these SQ algorithms to learn / to high accuracy, as we now explain. 

We now state the precise SQ learning result that we will use. The following theorem is a direct conse- 
quence of, e.g., Theorems 5 and 6 of [Dec93] or alteratively of Theorems 5 and 6 of [AD98]: 

Theorem 51. (Malicious noise tolerant SQ algorithm for learning sparse disjunctions) Let Crjisj,fc be the 
class of all disjunctions of length at most k over N Boolean variables x±, . . . , xn- There is a distribution- 
independent SQ learning algorithm A^^ J for Coisj,fc that has running time t\ = poly(./V, 1/e, log(l/<5)), 
uses at most t% = poly(A r ) time to evaluate each query, and requires tolerance of its queries no smaller 
than t = l/poly(fc, 1/e). The algorithm outputs a hypothesis which is a disjunction over x±, . . . , xj\f. 
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Moreover, there is a fixed polynomial £(■) such that algorithm ^4gQ SJ has the following property: Fix a 
distribution D over {0, 1}^. Let f be an N -variable Boolean function which is such that Pr xr ^£)[f'(x) 7^ 
f(x)] < k, where f € Cdisj,^ is some k-variable disjunction and k < £{e/k) < e/2. Then if -4§q SJ is 
run with a STAT(/, D) oracle, with probability 1 — 5 it outputs a hypothesis h such that Pr Xr ^rj[h(x) ^ 
f'(%)] < e/2, and hence Y > r x ^£ ) [h{x 7^ f{x)\ < e. 

(We note in passing that at the heart of Theorem |5T] is an attribute-efficient SQ algorithm for learning 
sparse disjunctions. Very roughly speaking, an attribute efficient SQ learning algorithm is one which can 
learn a target function over N variables, which actually depends only on an unknown subset of k <C N 
of the variables, using statistical queries for which the minimum value of the tolerance r is "large." The 
intuition behind Theorem |5T] is that since the distance between / and /' is much less than r, the effect of 
using a STAT(/, V) oracle rather than a STAT(/', V) oracle is negligible, and hence the SQ algorithm will 
succeed whether it is run with / or /' as the target function.) 

5.2 A densifier for DNF n s and the proof of Theorem [48] In this subsection we state our main theorem 
regarding the existence of densifiers for DNF formulas, Theorem [52], and show how Theorem [48] follows 
from this theorem. 

Theorem 52. Let 7(n, s, 1/e, 1/5) = l/{An 2lo ^ 2s / l ^/ s ^\og(s/5)). Algorithm A™ Fn ' s (U f -i {1) ,e,5,p) 

def 

outputs a collection S of conjunctions C\, . . . , C151 and has the following performance guarantee: If p = 

^ def 

Vr xr ju n [f\ x ) = 1] — V < (1 + e )P> men w i m probability at least 1—5, the function g(x) = Vj 6 r|snCj 
satisfies the following: 

1. Pr x ^ Uf _ 1(i) [g(x) = 1} > 1 - e; 

2. Pr^w g _ 1{1) [f(x) = 1] > j(n, s, 1/e, 1/5). 

3. There is a DNF f = V • • - VCi „ which is a disjunction of s' < s of the conjunctions C\,..., C\g\, 
such that Pr a; ^^_ 1 [f'(x) ^ f(x)] < l(e/s), where £(■) is the polynomial from TheoremBT] 

.DNF 



The size of S and the running time of A den n ' s (Uf-in), e, 5,p) is poly(n log ( S//e \ log(l/5)). 



, DNF„ 



With a slight abuse of terminology we may rephrase the above theorem as saying that A dcn " ,s is a 
(e, 7, 5)-densiner for function class C = DNF n>s using class C = DNF„ i( where t = n °^ s / e ^. We 
defer the description of Algorithm ^^^ F "' s and the proof of Theorem[52]to the next subsection. 

Proof of Theorem The proof is essentially just an application of Theorem |2"T1 The only twist is the use 
of a SQ disjunction learning algorithm rather than a DNF learning algorithm, but the special properties of 
Algorithm »4gQ SJ let this go through without a problem. 

In more detail, in Step 2(e) of Algorithm A'^ n (see Section l3~3T ). in the execution of Algorithm -4sq-SIM> 
the SQ algorithm that is simulated is the algorithm ^4sq SJ run over the feature space S of all conjunctions 

that are output by Algorithm -4^,^ Fn s in Step 1 of Algorithm A^ v (i.e., these conjunctions play the role 
of variables x±, . . . ,xjy for the SQ learning algorithm). Property (3) of Theorem [52] and Theorem |5"T1 
together imply that the algorithm ^4sq SJ , run on a STAT(/, U g -i^) oracle with parameters e, 5, would with 
probability 1 — 5 output a hypothesis h! satisfying Pr a .^ g _ 1 [h (x) 7^ f(x)] < e. Hence the hypothesis h 

that is output by *Asq-sim in Step 2(e) of Algorithm A'^ v fulfills the necessary accuracy (with respect to / 
under D = U g -ii\\) and confidence requirements, and the overall algorithm Af nv succeeds as described in 
Theorem |2T1 
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Finally, combining the running time bounds of -4 r ^ Fn ' s and -4|q SJ with the time bounds of the other 
procedures described earlier, one can straightforwardly verify that the running time of the overall algorithm 

^f nv ispoly(n lo ^A),log(l/,5)). ^ ^ □ 

5.3 Construction of a densifier for DNF n s and proof of Theorem [52] Let / = T\ V • • • V T s be 

the target s-term DNF formula, where T±, . . . ,T S are the terms (conjunctions). The high-level idea of our 
densifier is quite simple: If Tj is a term which is "reasonably likely" to be satisfied by a uniform draw of x 
from then Tj is at least "mildly likely" to be satisfied by r = 2 log n consecutive independent draws 

of x from Such a sequence of draws x , . . . , x r will with high probability uniquely identify Tj. By 

repeating this process sufficiently many times, with high probability we will obtain a pool C±, . . . , C\$\ of 
conjunctions which contains all of the terms Tj that are reasonably likely to be satisfied by a uniform draw 
of x from Theorem [52] follows straightforwardly from this. 

We give detailed pseudocode for our densifier algorithm below: 



Algorithm A™ Fn '° ,e,S,p): 

Input: Independent samples from Uf-in\, parameters e, 5 > 0, and a value l/2 n < p < 1. 

Output: If p < p < (1 + e)p, with probability 1 — 5 outputs a set S of conjunctions C\, . . . , C\$\ as 

described in Theorem [52] 

1. Initialize set S to 0. Let £(■) be the polynomial from Theorem [BT1 

2. For i = 1 to M = 2 n 21 °s( 2s /^/ s )) log(s/5), repeat the following: 

(a) Draw r = 2 log n satisfying assignments x , . . . , x r from Uf-i^y 

(b) Let Ci be the AND of all literals that take the same value in all r strings x 1 , . . . , x r (note Cj 
may be the empty conjunction). We say Ci is a candidate term. 

(c) If the candidate term Cj satisfies Pr xr ^u n [Ci(x) = 1] < p then add Cj to the set S. 

3. Output S. 



The following crucial claim makes the intuition presented at the start of this subsection precise: 

Claim 53. Suppose Tj is a term in f such that ^ Pr x ^^._ 1(i) [Tj(a;) = 1] > £(e/s)/(2s). Then with probabil- 

DNF 

ity at least 1—5/s, term Tj is a candidate term at some iteration of Step 2 of Algorithm A dcQ ™' s (Uj-i^, e, 5, 
Proof. Fix a given iteration i of the loop in Step 2. With probability at least 

(^(e/s)/(2s)) 21ogn = (i/ n )2iog(2,/^/s)) 5 

all 21ogn points x 1 , . . . ,x 2logn satisfy Tj; let us call this event E, and condition on E taking place. 
We claim that conditioned on E, the points x 1 , . . . , x 21ogn are independent uniform samples drawn from 
Tj l (l). (To see this, observe that each x % is an independent sample chosen uniformly at random from 
n Tj 1 ; but n Tj x (X) is identical to Tr^l).) Given that x 1 , . . .,x 2logn are independent 

uniform samples drawn from T J _1 (1), the probability that any literal which is not present in Tj is contained 
in C{ (i.e., is satisfied by all 21ogn points) is at most 2n/n 2 < 1/2. So with overall probability at least 
2n 2 iog(2s/£(e/ 3 )) ' th e term i s a candidate term at iteration i. Consequently Tj is a candidate term at some 
iteration with probability at least 1 - 5/s, by the choice of M = 2n 2lo ^ 2s /^ e / s ^ log(s/<5). □ 
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Now we are ready to prove Theorem [52] 

Proof of Theorem |52] The claimed running time bound of -4^,^ Fn ' s is easily verified, so it remains only to 
establish (l)-(3). Fix p such thatp < p < (1 + e)p where p = Pr Xr ^u n [f(x) = 1]. 

Consider any fixed term Tj of / such that Pr^ [Tj(x) = 1] > £(e/s)/(2s). By Claim[53]we 
have that with probability at least 1 — 6/s, term Tj is a candidate term at some iteration of Step 2 of the 
algorithm. We claim that in step (c) of this iteration the term Tj will in fact be added to S. This is because 
by assumption we have 

Pr^KW = 1] < Pr^ Wn [/(x) = 1] =p < p. 

So by a union bound, with probability at least 1 — 5 every term Tj in / such that Y > v x ^u f - 1{1) [Tj(x) = 1] > 
£(e/s)/(2s) is added to S. 

Let L be the set of those terms Tj in / that have Pr^^ _ x [Tj(x) = 1] > £(e/s)/(2s). Let /' be the 
DNF obtained by taking the OR of all terms in L. By a union bound over the (at most s) terms that are in 
/ but not in /', we have Pr x ^^ [f'( x ) = 1] > 1 — £(e/s)/2. Since g (as defined in Theorem [52] has 
g(x) = 1 whenever f'(x) = 1, it follows that Pr a; ^^_ 1 [g(x) = 1] > 1 — t{e/s)/2 > 1 — e, giving item 
(1) of the theorem. 

For item (2), since /(x) = 1 whenever /'(x) = 1, we have Piw^ \f{x) = 1] > Pr^_ 1(i) [/'(2;) = 
1]. Every x such that /'(x) = 1 also has g(x) = 1 so to lower bound Pr Xr ^y _ x [/'(x) = 1] it is enough to 

upper bound the number of points in (7 _1 (1) and lower bound the number of points in Since each 

Ci that is added to S is satisfied by at most p2 n < (1 + e)p2 n points, we have that | g _1 (1) j < (1 + e)pM2 n . 
Since at least 1 — e of the points that satisfy / also satisfy /', we have that > p(l — e)2 n . Thus we 

have PW 9 _ 1(1) lf(x) = 1] > p(l - e)/((l + e)pM) = l=f ■ ^ > ^, giving (2). 

Finally, for (3) we have that /(x) ^ /'(x) only on those inputs that have /(x) = 1 but /'(x) = 
(because some term outside of L is satisfied by x and no term in L is satisfied by x). Even if all such 
inputs x lie in (7 _1 (1) (the worst case), there can be at most (£(e/s)/2)p2 n such inputs, and we know that 
\g-Hl)\ > \rHl)\ >p(l-e)2 n . So we have Pr x ^_ 1(i) [/(x) + /'(*)] < < £(e/a), and we 

have (3) as desired. □ 

5.4 Inverse approximate uniform generation for fc-DNFs. We briefly note that our general approach 
immediately yields an efficient inverse approximate uniform generation algorithm for the class of fc-DNFs 
for any constant k. Let fc-DNF denote the class of all /c-DNFs over n Boolean variables, i.e., DNF formulas 
in which each term (conjunction) has at most k literals. 

Theorem 54. There is an algorithm ^4^^ )NF which is an inverse approximate uniform generation algorithm 
for the class fc-DNF. Given input parameters e, 5 the algorithm runs in time poly (n k , 1/e, log(l/5)). 

For any fc-DNF / it is easy to see that Pr Xr ^y n [/(x) = 1] > l/2 fe , and consequently the constant 1 func- 
tion is a 7-densifier for fc-DNF with 7 = l/2 k . Theorem [54] then follows immediately from Theorem |2T1 
using the algorithms for approximate uniform generation and counting of DNF formulas mentioned above 
HKLM 891 together with well-known algorithms for SQ learning fc-DNF formulas in poly(n fc , 1/e, log(l/ 5)) 
time HKea98H . 
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6 Negative results for inverse approximate uniform generation 



In this section, we will prove hardness results for inverse approximate uniform generation problems for 
specific classes C of Boolean functions. As is standard in computational learning theory, our hardness 
results are based on cryptographic hardness assumptions. The hardness assumptions we use are well studied 
assumptions in cryptography such as the strong RSA assumption, Decisional Diffie Hellman problem, and 
hardness of learning parity with noise. 

As was alluded to in the introduction, in light of the standard approach, there are two potential barriers 
to obtaining inverse approximate uniform generation algorithms for a class C of functions. The first is 
that "reconstructing" the object from class C may be hard, and the second is that sampling approximately 
uniform random satisfying assignments from the reconstructed object may be hard. While any hard inverse 
approximate uniform generation problem must be hard because of one of these two potential barriers, we 
emphasize here that even if one of the two steps in the standard approach is shown to be hard, this does 
not constitute a proof of hardness of the overall inverse approximate uniform generation problem, as there is 
may exist some efficient algorithm for the class C which departs from the standard approach. Indeed, we will 
give such an example in Section 13 where we give an efficient algorithm for a specific inverse approximate 
uniform generation problem that does not follow the standard approach. (In fact, for that problem, the second 
step of the standard approach is provably no easier than the well-known graph automorphism problem, which 
has withstood several decades of effort towards even getting a sub-exponential time algorithm.) 

Our hardness results come in two flavors. Our first hardness results, based on signature schemes, are 
for problems where it is provably hard (of course under a computational hardness assumption) to sample 
approximately uniform satisfying assignments. In contrast, our hardness results of the second flavor are 
based on Message Authentication Codes (MACs). We give such a result for a specific class C which has the 
property that it is actually easy to sample uniform satisfying assignments for functions in C; hence, in an 
informal sense, it is the first step in the standard approach that is algorithmically hard for this problem. The 
following subsections describe all of our hardness results in detail. 

6.1 Hardness results based on signature schemes. In this subsection we prove a general theorem, The- 
orem [60l which relates the hardness of inverse approximate uniform generation to the existence of certain 
secure signature schemes in cryptography. Roughly speaking, Theorem [60] says that if secure signature 
schemes exist, then the inverse approximate uniform generation problem is computationally hard for any 
class C which is Levin-reducible from CIRCUIT-SAT. We will use this general result to establish hard- 
ness of inverse approximate uniform generation for several natural classes of functions, including 3-CNF 
formulas, intersections of two halfspaces, and degree-2 polynomial threshold functions (PTFs). 

We begin by recalling the definition of public key signature schemes. For an extensive treatment of 
signature schemes, see MGol04| . For simplicity, and since it suffices for our purposes, we only consider 
schemes with deterministic verification algorithms. 

Definition 55. A signature scheme is a triple (G,S,V) of polynomial-time algorithms with the following 
properties : 

• (Key generation algorithm) G is a randomized algorithm which on input l n produces a pair (pk, sk) 
(note that the sizes of both pk and sk are polynomial in n). 

• (Signing algorithm) S is a randomized algorithm which takes as input a message mfrom the message 
space M, a secret key sk and randomness r£ {0, l} n , and outputs a signature a = S(m, sk, r). 

• (Verification algorithm) V is a deterministic algorithm such that V(m,pk,a) = 1 for every a = 
S(m, sk, r). 
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We will require signature schemes with some special properties which we now define, first fixing some 
notation. Let (G, S, V) be a signature scheme. For a message space A4 and pair (pk, sk) of public and 
secret keys, we define the set Tli :S k of "valid" signed messages as the set of all possible signed messages 
(m, a = S(m, sk, r)) as m ranges over all of Ai and r ranges over all of {0, l} n . Similarly, we define 
the set lZ2,pk of "potential" signed messages as 7Z2,pk = : V(m,pk,a) = 1}. Likewise, we 

define the set of valid signatures for message m, denoted lZi s k(m), as the set of all possible pairs (m, a = 
S(m, sk, r)) as r ranges over all of {0, 1}" , and we define the set of potential signatures for message m as 
T^2, P k{m) = {(m,a) : V(m,pk,a) = 1}. 

Definition 56. Let (G, S, V) be a signature scheme and M. be a message space. A pair (pk, sk) of public 
and secret keys is said to be (8, r/)-special if the following properties hold : 

• Let 1Zi, s k be the set of valid signed messages and 1Z2,pk be the set of potential signed messages. Then 
> 1 - r?. 



I^2,pfel 

• For any fixed pair (m, a) € Ri jS fc(m), we have Pr rg { 0i i}n [a = S(m, sk, r)\ = ^ \{m)\ ' 

• Define two distributions D and D' over pairs (m, a) as follows : D is obtained by choosing m Ejj A4 
and choosing a Eu lZi ;S k(m). D' is the distribution defined to be uniform over the set 1Z\ tS k- Then 
d TV (D,D') < 5. 

From now on, in the interest of brevity, M. will denote the "obvious" message space A4 associated with 
a signature scheme unless mentioned otherwise. Similarly, the randomness r for the signing algorithm S 
will always assumed to be r Eu {0, 1}™. 

We next recall the standard notion of existential unforgeability under RMA (Random Message Attack): 

Definition 57. A signature scheme (G,S,V) is said to be (t, e)-RMA secure if the following holds: Let 
(pk,sk) <(— G(l n ). Let (m\, . . . ,mt) be chosen uniformly at random from A4. Let oi S(rrii, sk,r). 
Then, for any probabilistic algorithm A running in time t, 

Pr [A(pk,mi,...,mt,ai,...,a t ) = {rn,a'))<e 

(pk,sk),(mi,...,mt),(cri,...,crt) 

where V(m',pk, a') = 1 and m! ^ mi for all i = 1, . . . , t. 

Next we need to formally define the notion of hardness of inverse approximate uniform generation: 

Definition 58. Let C be a class of n-variable Boolean functions. C is said to be (t(n), e, 5) -hard for inverse 
approximate uniform generation if there is no algorithm A running in time t(n) which is an (e, S)-inverse 
approximate uniform generation algorithm for C. 

Finally, we will also need the definition of an invertible Levin reduction: 

Definition 59. A binary relation R is said to reduce to another binary relation R' by a time-t invertible 
Levin reduction if there are three algorithms a, f3 and 7, each running in time t(n) on instances of length 
n, with the following property: 

• For every (x,y) G R, it holds that (a(x), j3(x,y)) € R'; 

• For every (a(x), z) € R', it holds that (x, j(a(x), z)) £ R. 

Furthermore, the functions /3 and 7 are injective maps with the property that ^(a(x), (3(x, y)) = y. 

Note that for any class of functions C, we can define the binary relation Rc as follows : (/, x) € Rc if 
and only if f(x) = 1 and / G C. In this section, whenever we say that there is an invertible Levin reduction 
from class C\ to class C2, we mean that there is an invertible Levin reduction between the corresponding 
binary relations Rc 1 and Rc 2 ■ 
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6.1.1 A general hardness result based on signature schemes. We now state and prove our main theorem 
relating signature schemes to hardness of inverse approximate uniform generation: 



Theorem 60. Let (G, S, V) be a (t, e)-RMA secure signature scheme. Suppose that with probability at least 
99/100 a random pair (pk, sk) 4— G(l n ) is (5, r\)-special. Let C be a class of n-variable Boolean functions 
such that there is a Levin reduction from CIRCUIT-SAT to C running in time t'(n). Let K\ and k 2 be such 
that «i < 1 - 2 • (2r/ + 5 + t'(n)/\M\), k 2 < 1 - 2t'(n) ■ (rj + 8) and e < (1 - K X )(l - k 2 )/4 Ifh(-) is 
a time function such that 2t\(t'(n)) < t(n), then C is (t x (n), K\, K 2 )-hard for inverse approximate uniform 
generation. 

The high-level idea of the proof is simple: Suppose there were an efficient algorithm for the inverse ap- 
proximate uniform generation problem for C. Because of the invertible Levin reduction from CIRCU IT-SAT 
to C, there is a signature scheme for which the verification algorithm (using any given public key) corre- 
sponds to a function in C. The signed messages (mi, o"i), . . . , (m t ,at) correspond to points from Uf-i^ 
where / G C. Now the existence of an efficient algorithm for the inverse approximate uniform generation 
problem for C (i.e. an algorithm which, given points from Uf-i^, can generate more such points) translates 
into an algorithm which, given a sample of signed messages, can generate a new signed message. But this 
violates the existential unforgeability under RMA of the signature scheme. 

We now proceed to the formal proof. 

Proof. Assume towards a contradiction that there is an algorithm A for inverse approximate uniform gen- 
eration A- mv which runs in time t\ such that with probability 1 — k 2 , the output distribution is Ki-close to 
the target distribution. If we can show that for any (<5, ^-special key pair (pk, sk) the resulting signature 
scheme is not (t, e) secure, then this will result in a contradiction. We will now use algorithm A to construct 
an adversary which breaks the signature scheme for (5, ?/)-special key pairs (pk, sk). 

Towards this, fix a (5, ?i)-special key pair (pk, sk) and consider the function V p u ■ M. x {0, 1}* — > {0, 1} 
defined as V p k(m, a) = V(m,pk, a). Clearly, V p k is an instance of CIRCUIT-SAT (i.e. V p k is computed by 
a satisfiable polynomial-size Boolean circuit). Since there is an invertible Levin reduction from CIRCUIT- 
SAT to C, given pk, the adversary in time t'(n) can compute <£> p fc 6 C with the following properties (let /3 
and 7 be the corresponding algorithms in the definition of the Levin reduction): 

• For every (m, a) such that V p k(m, a) = 1, <& p k(/3(V p k, (m, a))) = 1. 

• For every x such that & p k(x) = 1, V p k('y{$pk, %)) = 1- 

Recall that the adversary receives signatures (mi, ai), . . . , (m t /^,a t ^ n -j). Let Xi = (3(V p k, (m^dj)). 
Let D x be the distribution of (x±, ... ,x t rr n ))- We next make the following claim. 

Claim 61. Let y\, . . . ,y t i be drawn uniformly at random from $"^(1) and let D y be the corresponding 
distribution of (y\, . . . , yt). Then, D y and D x are t'(n)-(2r] + 5)-close in statistical distance. 

Proof. Note that D y and D x are f(n)-way product distributions. If D x ^ and Dy are the corresponding 
marginals on the first coordinate, then t'(n) ■ dxv {Dx , D y ') < dTv(D x , D y ). Thus, it suffices to upper 
bound d T v(D { x ) ,D { y l) ), which we now do. 

zGsupp(Dy 1 ^ )\supp(D^ ) z&supp(D^ ) 
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By definition of (pk, sk) being (5, ?y)-special, we get that 

£ \D x 1 \z)-D^(z)\< V . 

z£supp{D^ )\supp(D^ ) 



To bound the next sum, let r = Pr[D y £ supp(L>i 1 ' 1 )]. Note that t > 1 — r/. We have 



^ D?> (z) - DW ( z ) < £ T D(i) (*) _ D W (*) + (1 - r) £ D« (*) 



zesnppC-Di 1 ') 



zesnpp(Di 1) ) 



zSs«pp(-D£ 1 ^) 



E 



We observe that y r restricted to supp(D x ) is simply the uniform distribution over the image of the 

set 7^i jS fc and hence is the same as applying the map j3 on the distribution D'. Likewise D x is the same as 
applying the map j3 on D (mentioned in Definition l56l). Hence, we have that 



d TV (D^,D^) <2 V + d TV (D, D') <2r] + 5. 



□ 



Now, observe that the instances Xi are each of length at most t'(n). Since the distributions D x and 
D y are t'(n)-(2r) + 5) close, hence our adversary can run A- mv in time t(n) on the examples x±, . . . , x t ii n \ 
and succeed with probability 1 — K2 — t'(n)-(2rj + 5) > (1 — K-i)j2 in producing a sampler whose output 
distribution is Ki-close to U^-i, x y Call this output distribution Z. Let /3(D) denote the distribution obtained 



by applying the map f3 on D. The proof of Claim [6T1 shows that (3(D) is (2r/ + <5)-close to the distribution 
Thus, with probability (1— K2)/2, Zis (k\ + (2r/+5))-close to the distribution /3(D). By definition 

pk \ ' 

of D, we have 



Pr (m, CT )eD[Vi G [*'],mj / m] > 1 



Thus, with probability 1 2 K2 , 



t' 1 — K\ 

Pr z ez[z = g(m,a) and Vi G [i'],mj ^ m] > 1 - «i - (2r/ + 5) - j-^- > — - — 

Thus, with overall probability (1 — Ki)(l — K2)/4 > e, the adversary succeeds in producing z = g(m, a) 
such that Vi G [t 7 ], mj 7^ m. Applying the map 7 on (<3? p fc, z), the adversary gets the pair (m, a). Also, note 
that the total running time of the adversary is t\(t'(n)) + t'(n) < 2t\(t'(n)) < t(n) which contradicts the 
(t, e)-RMA security of the signature scheme. □ 

6.1.2 A specific hardness assumption. At this point, at the cost of sacrificing some generality, we con- 
sider a particular instantiation of a signature scheme from the literature which meets our requirements. 
While similar signature schemes can be constructed under many different cryptographic assumptions in the 
literature, we forsake such generality to keep the discussion from getting too cumbersome. 
To state our cryptographic assumption, we need the following notation: 



• PRIMES^ is the set of k-bit prime numbers. 
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• RSAfc is the set of all products of two primes of length [(k — 1)/2J . 

The following cryptographic assumption (a slight variant of the standard RSA assumption) appears in 
HMRV99L 

Assumption 1. The RSA' s(k) assumption: Fix any m € RSA^ and let x Eu %>* m and p £jj PRIMESk+i- 
Let A be any probabilistic algorithm running in time s{k). Then, 

Pr/ Xj p)[A(m, x,p) = y and y p = x (mod m)\ < . 

s(k) 

As mentioned in [MRV99], given the present state of computational number theory, it is plausible to 
conjecture the RSA' s(k) assumption for s(k) = 2 k for some absolute constant 5 > 0. For the sake of 
conciseness, for the rest of this section we write "Assumption Q] holds true" to mean that Assumption Q] 
holds true with s(k) = 2 n for some fixed constant 5 > 0. (We note, though, that all our hardness results go 
through giving superpolynomial hardness using only s(k) = jfe^ 1 ).) 

Micali et al. MMRV99I give a construction of a "unique signature scheme" using Assumption [I] 

Theorem 62. If Assumption\I}holds true, then there is a (t = 2 n *, e = \/t)-RMA secure signature scheme 
(G, S, V) with the following property : For any message m 6 A4, there do not exist o~\ ^ o~2 such that 
V(m, 0"i) = V(m, o~2) = 1- In this scheme the signing algorithm S is deterministic and the message space 
M is of size 2 n . 

The above theorem says that under the RSA' s(k) assumption, there is a deterministic signature scheme 
such that there is only one signature a m for every message m, and for every message m the only accepting 
input for V is (m, a m ). As a consequence, the signature scheme in Theorem [62] has the property that every 
(pk, sk) pair that can be generated by G is (0, 0)-special. 

Remark 63. It is important to note here that constructions of (0, 0) special signature schemes are abundant 
in the literature. A partial list follows : Lysyanskaya |Lys02] constructed a deterministic (0, 0) special sig- 
nature scheme using a strong version of the Diffie-Hellman assumption. Hohenberger and Waters [HW10] 
constructed a scheme with a similar guarantee using a variant of the Diffie-Hellman assumption on bilinear 
groups. In fact, going back much further, Cramer and Shoup [CS00l lFis031 show that using the Strong RSA 
assumption, one can get a (0, 0) special signature scheme (which however is not deterministic). We remark 
that the scheme as stated in [CS00] is not (0, 0) special in any obvious sense, but the more efficient version 
in MFis031 can be easily verified to be (0, 0) special. Throughout this section, for the sake of simplicity, we 
use the signature scheme in Theorem [62] 

Instantiating Theorem [60] with the signature scheme from Theorem [62] we obtain the following corol- 
lary: 

Corollary 64. Suppose that Assumption\J\holds true. Then the following holds : Let C be a function class 
such that there is a polynomial time (n k -time) invertible Levin reduction from CIRCUIT-SAT to C. Then 
C is (2™ c , 1 — 2~ nC , 1 — 2~ n ") -hard for inverse approximate uniform generation for some constant c > 
(depending only on the "b" in Assumption\J\and on k). 

6.1.3 Inverse approximate uniform generation hardness results for specific function classes whose 
satisfiability problem is NP-complete. In this subsection we use Corollary [64] to prove hardness results for 
inverse approximate uniform generation for specific function classes C for which there are invertible Levin 
reductions from CIRCUIT-SAT to C. 

Recall that a 3-CNF formula is a conjunction of clauses (disjunctions) of length 3. The following fact 
can be easily verified by inspecting the standard reduction from CIRCUIT-SAT to 3-CNF-SAT. 
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Fact 65. There is a polynomial time invertible Levin reduction from CIRCUIT-SAT to 3-CNF-SAT. 



As a corollary, we have the following result. 

Corollary 66. If Assumption\l\holds true, then there exists an absolute constant c > such that the class 
3-CNFis (2™ c , 1 — 2~™ c , 1 — 2~™ c )- hard for inverse approximate uniform generation. 

Corollary [66] is interesting in light of the well known fact that the class of all 3-CNF formulas is effi- 
ciently PAC learnable from uniform random examples (in fact under any distribution). 

We next observe that the problem of inverse approximate uniform generation remains hard even for 
3-CNF formulas in which each variable occurs a bounded number of times. To prove this we will use the 
fact that polynomial time invertible Levin reductions compose: 

Fact 67. If there is a polynomial time invertible Levin reduction from CIRCUIT-SAT to C and a polynomial 
time Levin reduction from C to C\, then there is a polynomial time invertible Levin reduction from CIRCUIT- 
SAT to d. 

The following theorem says that the inverse approximate uniform generation problem remains hard for 
the class of all 3-CNF formulas in which each variable occurs at most 4 times (hereafter denoted 3,4-CNF). 

Theorem 68. If Assumption\J\holds true, then there exists an absolute constant c > such that 3,4-CNF- 
SATis (2™ c , 1 — 2~™ c , 1 — 2~™ c ) -hard for inverse approximate uniform generation. 

Proof. Tovey MTov841 shows that there is a polynomial time invertible Levin reduction from 3-CNF-SAT 
to 3,4-CNF-SAT. Using Fact[67l we have a polynomial time Levin reduction from CIRCUIT-SAT to 3,4- 
CN F-S AT. Now the result follows from Corollary [64] □ 

The next theorem shows that the class of all intersections of two halfspaces over n Boolean variables is 
hard for inverse approximate uniform generation. 

Theorem 69. If Assumption\J\holds true, then there exists an absolute constant c > such that C = {all 
intersections of two halfspaces over n Boolean variables} is (2 nC , 1 — 2~ n ° , 1 — 2~ nC )-hard for inverse 
approximate uniform generation. 

Proof. We recall that the SUBSET-SUM problem is defined as follows : An instance <E> is defined by 
positive integers w\, . . . , w n , s > 0. A satisfying assignment for this instance is given by x G {0, l} n such 
that Y17=i w i x i = s - It is well known that the SUBSET-SUM problem is NP-complete and it is folklore that 
there is a invertible Levin reduction from 3-SAT to SUBSET-SUM. However, since it is somewhat difficult 
to find this reduction explicitly in the literature, we outline such a reduction. 

To describe the reduction, we first define 1-in-3-SAT. An instance ^ of 1-in-3-SAT is defined over 
Boolean variables x\, . . . , x n with the following constraints : The i th constraint is defined by a subset of at 
most three literals over x\, . . . , x n . An assignment to x\, . . . , x n satisfies \t if and only if for every constraint 
there is exactly one literal which is set to true. Schaefer MSch78ll showed that 3-SAT reduces to 1 -in-3-SAT 
in polynomial time, and the reduction can be easily verified to be an invertible Levin reduction. Now the 



standard textbook reduction from 3-SAT to SUBSET-SUM (which can be found e.g. in |Pap94 1) applied to 
instances of 1 -in-3-SAT, can be easily seen to be a polynomial time invertible Levin reduction. By Factl67l 
we thus have a polynomial time invertible Levin reduction from 3-CNF-SAT to SUBSET-SUM. 

With this reduction in hand, it remains only to observe that that any instance of SUBSET-SUM is 
also an instance of "intersection of two halfspaces," simply because Y17=i w i x i = s if and only if s < 
Y17=i w i' x i — s - Thus, there is a polynomial time invertible Levin reduction from 3-CNF-SAT to the class 
of all intersections of two halfspaces. This finishes the proof. □ 
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6.1.4 A hardness result where the satisfiability problem is in P. So far all of our hardness results have 
been for classes C of NP-complete languages. As Theorem l60l requires a reduction from CIRCUIT-SAT to 
C, this theorem cannot be directly used to prove hardness for classes C which are not NP-hard. We next 
give an extension of Theorem [60] which can apply to classes C for which the satisfiability problem is in P. 
Using this result we will show hardness of inverse approximate uniform generation for MONOTONE-2- 
CNF-SAT. (Recall that a monotone 2-CNF formula is a conjunction of clauses of the form Xi V xj, with no 
negations; such a formula is trivially satisfiable by the all-true assignment.) 

We begin by defining by a notion of invertible one-many reductions that we will need. 

Definition 70. CIRCUIT-SAT is said to have an ^-almost invertible one-many reduction to a function class 
C if the following conditions hold: 

• There is a polynomial time computable function f such that given an instance of CIRCUIT-SAT 
(i.e. $ is a satisfiable circuit), ^ = /(3>) is an instance ofC (i.e. $ G C and is satisfiable). 

• Fix any instance <!> of CIRCUIT-SAT and let A = denote the set of satisfying assignments of 

Then A can be partitioned into sets A\ and Ai such that \ A^\j\A\ < r\ and there is an efficiently 
computable function g : Ai — > 3>~ 1 (1) such that g(x) is a satisfying assignment of <3? for every 

x e Ai. 

• For every y which is a satisfying assignment of the number of pre-images of y under g is exactly 
the same, and the uniform distribution over g~ 1 (y) is polynomial time samplable. 

We next state the following simple claim which will be helpful later. 

Claim 71. Suppose there is an rj-almost invertible one-many reduction from CIRCUIT-SAT to C. Let f and 
g be the functions from Definition [70] Let be an instance of CIRCUIT-SAT and let = f(&) be the 
corresponding instance ofC. Define distributions D\ and P>i as follows : 

• A draw from D\ is obtained by choosing y uniformly at random from <j? _1 (l) and then outputting z 
uniformly at random from g~ l (y). 

• A draw from D2 is obtained by choosing z' uniformly at random from ^^(l). 

Then we have dxv {T>i, -D2) < 

Proof. This is an immediate consequence of the fact that D\ is uniform over the set A\ while D2 is uniform 
over the set A (from Definition l70l). □ 

We next have the following extension of Corollary [64] 

Theorem 72. Suppose that Assumption \J\ holds true. Then if C is a function class such that there is an 
rj-abnost invertible one-many reduction (for rj = 2~ n ^)from CIRCUIT-SAT to C, then C is (2™ c ,l — 
2 _nC , 1 — 2~ nC )-hardfor inverse approximate uniform generation for some absolute constant c > 0. 

Proof. The proof is similar to the proof of Corollary [64j Assume towards a contradiction that there is an 
algorithm for inverse approximation uniform generation A[ nv for C which runs in time t\ such that with 
probability 1 — k 2 , the output distribution is Ki-close to the target distribution. (We will set t\, k\ and K2 
later to 2" c , 1 - 2~™ c and 1 - 2~ nC respectively.) 

Let (G, S, V) be the RMA-secure signature scheme constructed in Theorem [62j Note that (G, S, V) is 
a (T,e)-RMA secure signature scheme where T = 2 n \ e = 1/T and \M\ = 2 n ^ for constant S,fi > 0. 
Let (pk, sk) be a choice of key pair. We will us A[ nv to contradict the security of (G, S, V). Towards this, 
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consider the function V p k ■ M. x {0, 1}* — > {0, 1} denned as V p k(m, a) = V(m,pk, a). Clearly, V p k is an 
instance of CIRCUIT-SAT. Consider the ?7-invertible one-many reduction from CIRCUIT-SAT to C. Let a 
and j3 have the same meaning as in Definition [70] Let ^ = a(V p k) and let A, A\ and «4 2 have the same 
meaning as in Definition ITUl The adversary receives message-signature pairs (mi, ffi) . . . (m tl , o~t x ) where 
mi, . . . , m tl are chosen independently at random from M.. For any i, (mj, af) is a satisfying assignment of 
V p k- By definition, in time t<i = t\ ■ poly(n), the adversary can sample (z±, . . . , z^) such that z\, . . . , z tl 
are independent and z^ ~ Up-i( mij(Ti y Note that this means that each z,- L is an independent sample from 
A\ and |zj| = poly(n). Note that (zi, . . . , z tl ) is a ii-fold product distribution such that if D' denotes the 
distribution of z%, then by Claim I7T1 cLtv{D' M^- 1 ^)) < V- Hence, if D is the distribution of (z\, ... ,z tl ), 
then dry (D,Z^_ 1(1) ) < ^77. 

Hence, the adversary can now run A rec on the samples z\, ... and as long as 1 — K2 — hrj > 
(1 — «2)/2, succeeds in producing a sampler with probability (1 — K2)/2 whose output distribution (call 
it Z) is K\ close to the distribution Note that as rj = 2~ n ( n \ for any c > 0, ti = 2 nC and 

«2 = 1 — 2~™ c satisfies this condition. Hence, we get that dxv{Z, D') < K\ + 77. Now, observe that 

Pr p6l) /[/3(p) = (m, cr) and m / m*] = 1 - j-^j. 

The above uses the fact that every element in the range of /3 has the same number of pre-images. This of 
course implies that 

Pr p ez[P(p) = (m,cr) and m / m ; ] > 1 - t-^t - (m +77). 

Again as long as «i < 1 — 2(?7 + the adversaiy succeeds in getting a valid message signature 

pair (m, cr) with m 7^ mj for any 1 < i < t\ with probability (1 — ki)/2. Again, we can ensure Ki < 
1 — 2(77 + by choosing c sufficiently small compared to //. The total probability of success is 

(1 — Ki)(l — «2)/4 and the total running time is ti(poly(n)) + poly(n). Again if c is sufficiently small 
compared to \i and 5, then the total running time is at most £i(poly(re)) + poly(n) < T and the success 
probability is at least (1 — — «2)/4 > e, resulting in a contradiction. □ 

We now demonstrate apolynomial time ?7-invertible one-many reduction from CIRCUIT-SAT to MONOTONE- 
2-CNF-SATfor 77 = 2~ n( - n \ The reduction uses the "blow-up" idea used to prove hardness of approximate 
counting for MONOTONE-2-CNF-SAT in HJVV86II . We will closely follow the instantiation of this tech- 
nique in BWatl2H . 

Lemma 73. There is a polynomial time ij-almost invertible one-many reduction from CIRCUIT-SAT to 
MONOTONE-2-CNF-S AT where 77 = 2~^ n \ 

Proof. We begin by noting the following simple fact. 

Fact 74. If there is a polynomial time invertible Levin reduction from CIRCUIT-SAT to a class C\ and an 
ij-almost invertible one-many reduction from C\ to C% then there is a polynomial time r]-almost invertible 
one-many reduction from CIRCUIT-SAT to C2. 

Since there is an invertible Levin reduction from CIRCUIT-SAT to 3-CNF-SAT, by virtue of Factl74l 
it suffices to demonstrate a polynomial time 77-almost invertible one-many reduction from 3-CNF-SAT to 
MONOTONE-2-CNF-SAT. To do this, we first construct an instance of VERTEX-COVER from the 3- 
CNF-SAT instance. Let $ = Ai^i^i be tn e instance of 3-CNF-SAT. Construct an instance of VERTEX- 
COVER by introducing seven vertices for each clause $j (one corresponding to every satisfying assignment 
of Now, put an edge between any two vertices of this graph if the corresponding assignments to the 
variables of $ conflict on some variable. We call this graph G. We observe the following properties of this 
graph : 
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• G has exactly 7m vertices. 

• Every vertex cover of G has size at least 6m. 

• There is an efficiently computable and invertible injection I between the satisfying assignments of 
<I> and the vertex covers of G of size 6m. To get the vertex cover corresponding to a satisfying 
assignment, for every clause <I>j, include the six vertices in the vertex cover which conflict with the 
satisfying assignment. 

We next do the blow-up construction. We create a new graph G' by replacing every vertex of G with a 
cloud of 10m vertices, and for every edge in G we create a complete bipartite graph between the correspond- 
ing clouds in G'. Clearly, the size of the graph G' is polynomial in the size of the 3-CNF-SAT formula. We 
define a map g\ between vertex covers of G' and vertex covers of G as follows : Let S' be a vertex cover of 
G'. We define the set S = gi(S') in the following way. For every vertex v in the graph G, if all the vertices 
in the corresponding cloud in G' are in S' , then include v G S, else do not include v in S. It is easy to 
observe that g\ maps vertex covers of G' to vertex covers of G. It is also easy to observe that a vertex cover 
of G of size s has (2 10m — \y m ~ s pre-images under g\. 

Now, observe that we can construct a MONOTONE-2-CNF-SAT formula \& which has a variable cor- 
responding to every vertex in G' and every subset S' of G' corresponds to a truth assignment yg> to \& such 
that ^(ys>) = 1 if and only if S' is a vertex cover of G' . Because of this correspondence, we can construct 
a map g[ which maps satisfying assignments of ^ to vertex covers of G. Further, a vertex cover of size s in 
graph G has (2 10m — i)7m-s pre-images under g[. Since the total number of vertex covers of G of size s is 
at most ( ™) , the total number of satisfying assignments of which map to vertex covers of G of size more 
than 6m can be bounded by : 

V ( 7m ] ■ (2 10m - l) 7m ~ s <m-( 7m V (2 10m - l) m ~ l < (2 10m - l) m • in 2?m 

s=6m+l v 7 \ 1 / 

On the other hand, since <I> has at least one satisfying assignment, hence G has at least one vertex cover 
of size 6m and hence the total number of satisfying assignments of ^ which map to vertex covers of G 
of size 6m is at least (2 10m — l) m . Thus, if we let A denote the set of satisfying assignments of ^ and 
A\ be the set of satisfying assignment of ^ which map to vertex covers of G of size exactly 6m (under 
gi), then |Al|/|.4| > 1 — 2~ n ( n \ Next, notice that we can define the map g mapping A\ to the satisfying 
assignments of $ in the following manner : g(x) = £~ l (g\(x)). It is easy to see that this map satisfies all 
the requirements of the map g from Definition |70] which concludes the proof. □ 

Combining Lemma [73] with Theorem |72l we have the following corollary. 

Corollary 75. If AssumptionUlholds true, then MONOTONE-2-CNF-SAT is {2 n \ 1 - 2~ n \ 1 - 2~ nC ) 

hard for inverse approximate uniform generation for some absolute constant c > 0. 

As a consequence of the above result, we also get hardness for inverse approximate uniform generation 
of degree-2 polynomial threshold functions (PTFs); these are functions of the form sign(q(x)) where q(x) 
is a degree-2 multilinear polynomial over {0, l} n . 

Corollary 76. If Assumption\7]holds true, then the class of all n-variable degree-2 polynomial threshold 
functions is (2™ c , 1 — 2 _nC , 1 — 2 _ ™ c ) hard for inverse approximate uniform generation for some absolute 
constant c > 0. 

Proof. This follows immediately from the fact that every monotone 2-CNF formula can be expressed as a 
degree-2 PTF. To see this, note that if $ = AI^=i( x «i V x «2) where each Xij is a 0/1 variable, then is 
true if and only if YllLi x n + x *2 — %n • £i2 > m. This finishes the proof. □ 
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6.2 Hardness results based on Message Authentication Codes. All of the previous hardness results 
intuitively correspond to the case when the second step of our "standard approach" is algorithmically hard. 
Indeed, consider a class C of functions that has an efficient approximate uniform generation algorithm. 
Unless P 7^ NP there cannot be any Karp reduction from CIRCUIT-SAT to C (this would contradict the 
NP-completeness of CIRCUIT-SAT) and hence Theorem [60] is not applicable in this setting. In fact, even 
for rj = 1 — l/poly(re) there cannot be any ^-almost invertible one-many reduction from CIRCUIT-SAT to 
C unless P ^ NP. This makes Theorem [72] inapplicable in this setting. Thus, to prove hardness results for 
classes that have efficient approximate uniform generation algorithms, we need some other approach. 

In this section we show that Message Authentication Codes (MAC) can be used to establish hardness of 
inverse approximate uniform generation for such classes. We begin by defining MACs. (We remark that we 
use a restricted definition which is sufficient for us; for the most general definition, see BGol04ll .) 

Definition 77. A Message Authentication Code (MAC) is a triple (G, T, V) of polynomial-time algorithms 
with the following properties : 

• (Key generation algorithm) G(-) is a randomized algorithm which on input l n produces a secret key 
sk; 

• (Tagging algorithm) T is a randomized algorithm which takes as input message m, secret key sk and 
randomness r and outputs a ^— T(m, sk, r); 

• (Verification algorithm) V is a deterministic algorithm which takes as input message m, secret key 
sk and a. If a = T(m, sk, r)for some r then V{m, sk, a) = 1. 

For the purposes of our hardness results we require MACs with some special properties. While our 
hardness results can be derived from slightly more general MACs than those we specify below, we forsake 
some generality for the sake of clarity. For a MAC (G, T, V) and a choice of secret key sk, we say a is a 
valid tag for message m if there exists r such that a = T(m, sk, r). Likewise, we say that a is a potential 
tag for message m if V{m, sk, a) = 1. 

Definition 78. A Message Authentication Code {G, T, V) over a message space A4 is said to be special if 
the following conditions hold : For any secret key sk, 

• For every message m € M, the set of valid tags is identical to the set of potential tags.. 

• For every two messages mi ^ rri2 and every o~\,a<i such that o~i is a valid tag for rrii, we have 
Pr r [T(mi, sk, r) = o~i\ = Pr r [T(m2, sk, r) = o 2] .. In particular, the cardinality of the set of valid 
tags for m is the same for all m. 

We next define the standard notion of security under Random Message attacks for MACs. As before, 
from now onwards, we will assume implicitly that M. is the message space. 

Definition 79. A special MAC (G, T, V) is said to be (t, e)-RMA secure if the following holds : Let sk 
G(l n ). Let (mi, . . . , mt) be chosen uniformly at random from A4. Let &i T(m,j, sk, r). Then for any 
probabilistic algorithm A running in time t, 

Pr [A(mi,. . . ,mt,ax,. . . ,a t ) = (m',a')] < e 

sfc,(mi,...,m t ),((Ji,...,ffi) 

where V(m', sk, a') = 1 and ml ^ rriifor all i = 1, . . . , t. 

It is known how to construct MACs meeting the requirements in Definition [79] under standard crypto- 
graphic assumptions (see BGol04l ). 
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6.2.1 A general hardness result based on Message Authentication Codes. The next theorem shows that 
special MACs yield hardness results for inverse approximate uniform generation. 

Theorem 80. Let c > and (G, T, V) be a (t, e)-RMA secure special MAC for some t = 2™ c and e = l/t 
with a message space A4 of size 2 Q ( n \ Let V s k denote the function V s k ■ (m, a) i— > V(m, sk, a). IfV s k € C 
for every then there exists 5 > such that C is (t\ , k, rf)-hardfor inverse approximate uniform generation 
for h = 2 n& and k = rj = 1 - T~ n& . 

Proof. Towards a contradiction, let us assume that there is an algorithm A- mv for inverse approximate uni- 
form generation of C which runs in time t\ and with probability 1 — r] outputs a sampler whose statistical 
distance is at most k from the target distribution. (We will set t\, k and r] later in the proof.) We will use 
A- mv to contradict the security of the MAC. Let sk be a secret key chosen according to G(l n ). Now, the ad- 
versary receives message-tag pairs (mi,o"i), . . . , (m tl , a tl ) where mi, . . . ,m tl are chosen independently 
at random from M.. Because the MAC is special, for each i we have that <jj is a uniformly random valid tag 
for the message rrtj. Hence each (m^ o~i) is an independent and uniformly random satisfying assignment of 

Vsk- 

We can thus run ^4i nv on the samples (mi,ffi),..., (m 4l , cr tl ) with its accuracy parameter set to k and 
its confidence parameter set to 1 — rj. Taking k = n = 1 — 2 _n , we can choose 5 small enough compared 

S c 

to c, and with t\ = 2 n we get that the total running time of A- mv is at most 2 n /2. By the definition of 
inverse approximate uniform generation, with probability at least 1 — r\ = 2~ n the algorithm A- mv outputs 
a sampler for a distribution Z that is k = (1 — 2~ n )-close to the uniform distribution over the satisfying 
assignments of V s k- Now, observe that 

Pr M~W v -i„ ,[rrii / m for all * G [h]] > 1 



V^W 1 ^ 1^1 ' 

Thus, 

Pr Zr ^z[z = (m, a) and m, 7^ m for all i € [t\]] > (1 — k) 



\M\ 

This means that with probability (1 — 77) * ((1 — /c) — my), the adversary can output a forgery. It is clear 

that for a suitable choice of 5 relative to c, recalling that k = rj = 1 — 2 _n *, the probability of outputting a 
forgery is greater than 2~™ c , which contradicts the security of the MAC. □ 

Unlike signature schemes, which permitted intricate reductions (cf. Theorem l60l). in the case of MACs 
we get a hardness result for complexity class C only if V s k itself belongs to C. While special MACs are 
known to exist assuming the existence of one-way functions [Gol04], the constructions are rather involved 
and rely on constructions of pseudorandom functions (PRFs) as an intermediate step. As a result, the 
verification algorithm V also involves computing PRFs; this means that using these standard constructions, 
one can only get hardness results for a class C if PRFs can be computed in C. As a result, the class C tends 
to be fairly complex, making the corresponding hardness result for inverse approximate uniform generation 
for C somewhat uninteresting. 

One way to bypass this is to use construction of MACs which do not involve use of PRFs as an interme- 



diate step. In recent years there has been significant progress in this area flKPC + llllDKPW12l . While both 



these papers describe several MACs which do not require PRFs, the one most relevant for us is the MAC 
construction of jKPC + llt based on the hardness of the "Learning Parity with Noise" (LPN) problem. 

6.2.2 Some specific hardness assumptions, and a corresponding specific hardness result. We first 
state a "decision" version of LPN. To do this, we need the following notation: 

• Let Ber T denote the following distribution over GF{2) : If x <— Ber T , then Pr[x = 1] = r. 
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• For x G GF(2) n , we use A(x,r, •) to denote the distribution (r,x ■ r © e) over GF(2) n x GF(2) 
where r ~ GF(2) n and e ~ i?er T and x ■ r = ©jXjrj ( mod 2). 

Assumption 2. Le? r G (0, 1/2) a«J Zef C^t ^ an oracle which, each time it is invoked, returns an 
independent uniformly random sample from A(x,r, •). The LPN assumption states that for any poly(n)- 
time algorithm A, 

\[Pr xeGF(2) n[A°*" = 1] - [Pr x&GF(2)n [A°^ = 1]| < e 
for some e which is negligible in n. 

LPN is a well-studied problem; despite intensive research effort, the fastest known algorithm for this 
problem takes time 2°( n / logn ) MBKW0311 . For our applications, we will need a variant of the above LPN 
assumption. To define the assumption, let A(x, £, r, •) denote the distribution over (A, A ■ x © e) where A 
is uniformly random in GF(2) ixn and e is uniformly random over the set {z G GF{2) 1 : wt(z) < \rf\}. 
The vector e is usually referred to as the noise vector. 

Assumption 3. Let r G (0, 1/2), £ = c ■ n for some < c < 1/2 and let O x ^ )T be an oracle which 
returns a uniformly random sample from A(x, £, r, •). Then the (t, e) exact LPN assumption states that for 
any algorithm A running in time t, 

|Pr. 6 GF(2)« = 1] - Pr xeGF(2r [A°'M a = 1] | < e 

For the sake of brevity, we henceforth refer to this assumption by saying "the exact (n, £, r) LPN problem 
is (t, e)-hard." 

The above conjecture seems to be very closely related to Assumption [2j but it is not known whether 
Assumption [2] formally reduces to Assumption [3] Assumption |3]has previously been suggested in the cryp- 
tographic literature HKSS101 in the context of getting perfect completeness in LPN-based protocols. We note 
that Arora and Ge HAGlll have investigated the complexity of this problem and gave an algorithm which 
runs in time n°™>. We believe that the proximity of Assumption |3]to the well-studied Assumption |2j as well 
as the failure to find algorithms for Assumption [3l make it a plausible conjecture. For the rest of this section 
we use Assumption [3] with t = 2 n/3 and e = 2~ nP for some fixed (3 > 0. 

We next define a seemingly stronger variant of Assumption [3] which we call subset exact LPN. This 
requires the following definitions: For x, v € GF{2) n , £,d < n and r 6 (0, 1/2), we define the distribution 
A a (x, v, £, r, •) as follows : 



A a (x,vJ,T,-) 



A(x-v,£,l/2,-) if wt{v)<d 
A(x-v,£,t,-) if wt(v) > d 

where x ■ v G GF(2) n is defined by (x ■ v)i = Xi ■ V{. In other words, if wt{v ) > d, then the distribution 
A a (x, v, £, t) projects x into the non-zero coordinates of v and then outputs samples corresponding to exact 
LPN for the projected vector. We define the oracle 0^ id r (-) which takes an input v G GF{2) n and outputs 
a random sample from A a (x, v, £, r, •). The subset exact LPN assumption states the following: 

Assumption 4. Let r G (0, 1/2), £ = c ■ n and d = d ■ nfor some < c, d < 1/2. The (t, e)-subset exact 
LPN assumption says that for any algorithm A running in time t, 



^ >r x&GF{2Y 



^ r x&GF(2) r ' 



(1 a 



< e. 



For the sake of brevity, we henceforth refer to this assumption by saying "the subset exact (n, £, d, t) LPN 
problem is (t, e)-hard." 
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Assumption[4]is very similar to the subset LPN assumption used in |KPC + Ill and previously considered 



in MPiel2i The subset LPN assumption is the same as Assumption [4] but with I = 1 and the coordinates 
of the noise vector e being drawn independently from Ber T . Pietrzak HPiel2H showed that the subset LPN 
assumption is implied by the standard LPN assumption (Assumption [2]) with a minor change in the security 
parameters. Along the same lines, the next lemma shows that Assumption [3] implies Assumption @] with a 
minor change in parameters. The proof is identical to the proof of Theorem 1 in HPiel2H and hence we do 
not repeat it here. 

Lemma 81. If the exact (n, i, r) LPN problem is (t, e) hard, then for any jGN, the subset exact (re', I , re + 
g, t) LPN problem is (t' , e') hard for re' > re + g, t' = t/2 and e' = e + 



it 

2?TT- 



Proof. The proof of this lemma follows verbatim from the proof of Theorem 1 in HPiel21 . The key obser- 
vation is that the reduction from subset LPN to LPN in Theorem 1 in MPiel2l is independent of the noise 
distribution. □ 

From Lemma I8T1 we get that Assumption [3] implies Assumption HI In particular, we can set I = re/5 

and g = n/10, n' > lire/ 10. Then we get that if the exact (n,£,r) problem is (2"" , 2~ n/5 ) hard for some 

P > 0, then the subset exact (re', £, llre/10, r) is also (2 n , 2~ n ) hard for some other /3' > 0. For the rest 
of this section, we set the value of i and g as above and we assume that the subset exact (re', £, llre/10, r) 

is (2 n0 ' , 2-""' ) hard for some /?' > 0. 

Now we are ready to define the following Message Authentication Code (MAC) (G, S, V), which we 
refer to as LPN-MAC: 

• The key generation algorithm G chooses a random matrix X G GF(2) Xxn and a string x' £ GF(2) X , 
where A = 2n. 

• The tagging algorithm samples R 6 GF(2) exX and e E GF(2) e where e is a randomly chosen vector 
in GF(2) e with at most \rf \ ones. The algorithm outputs (R, R T ■ (X ■ m + x') + e). 

• The verification algorithm, given tag (R, Z) for message rre, computes y = Z + R T ■ (X ■ m + x 1 ) 
and accepts if and only if the total number of ones in y is at most \t£\ . 

Note that all arithmetic operations in the description of the above MAC are done over GF{2). The 
following theorem shows that under suitable assumptions the above MAC is special and secure as desired: 

Theorem 82. Assuming that the exact (re, I, r) problem is (t, e) hard for t = 2 nf> and e = 2~ nf> for /3 > 0, 
LPN-MAC described above is a (t' , e')-RMA-secure special MAC for t' = 2 n> and e' = 2~ n " for some 
/?' > 0. 

Proof. First, it is trivial to observe that the MAC described above is a special MAC. Thus, we are only 
left with the task of proving the security of this construction. In jKPC + lll (Theorem 5), the authors show 
that the above MAC is secure with the above parameters under Assumption |2] provided the vector e in the 
description of LPN-MAC is drawn from a distribution where every coordinate of e is an independent draw 



from Ber T . (We note that the MAC of Theorem 5 in jKPC + llll is described in a slightly different way, but 



Dodis et al. MDKPW12I show that the above MAC and th e MAC of Theorem 5 in jKPC+llt are exactly the 
same). Follow the same proof verbatim except whenever jKPC + lll use the subset LPN assumption, we use 



the subset exact LPN assumption (i.e. Assumption 0]), we obtain a proof of Theorem l82l □ 
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6.2.3 A problem for which inverse approximate uniform generation is hard but approximate uni- 
form generation is easy. Given Theorem [80l in order to come up with a problem where inverse approxi- 
mate uniform generation is hard but approximate uniform generation is easy, it remains only to show that 
the verification algorithm for LPN-MAC can be implemented in a class of functions for which approximate 
uniform generation is easy Towards this, we have the following definition. 

Definition 83. BILINEAR-MAJORITYi tn \ tT is a class of Boolean functions such that every f eBILINEAR- 
MAJORITY e>ntX , T , f ■ GF(2) £xX x GF(2)''x GF(2) n -> {0, 1} is parameterized by subsets S l ,...,S x <^ 
[n] and x° G GF(2) X and is defined as follows : On input (R, Z, m) E GF(2) exX x GF(2) e x GF(2) n , 
define 

x 

yi = Z i + '^2 Rij ■ (Y] me + x °j) 
j=i eeSj 

where all the additions and multiplications are in GF(2). Then f(R, Z, m) = 1 if and only if at most 
\t€\ coordinates y± , . . . , yt are 1. 

Claim 84. For the LPN-MAC with parameters i, n, A and r described earlier, the verification algorithm V 
can be implemented in the class BILINEAR-MAJORITY ^ n \ T . 

Proof. Consider the LPN-MAC with parameters I, n, A and r and secret key X and x'. Now define a 
function / in BILINEAR-MAJORITY^ n A r where x° = x' and the subset Sj = {i : Xji = 1}. It is easy to 
check that the corresponding f(R, Z, m) = 1 if and only if (R, Z) is a valid tag for message m. □ 

The next and final claim says that there is an efficient approximate uniform generation algorithm for 
BILINEAR-MAJORITY^ nAr : 

Claim 85. There is an algorithm which given any f € Bl LI NEAR-MAJOR 'ITYi n \ T (with parameters 
Si, ■ ■ ■ , S\ C [n] and x° € GF(2) X ) and an input parameter 5 > 0, runs in time poly(n, £, A, log(l/<5)) 
and outputs a distribution which is 5-close to being uniform on 

Proof. The crucial observation is that for any (R, m), the set AR )m = {z : f(R, Z, m) = 1} has cardinality 
independent of R and m. This is because after we fix R and m, if we define bi = J2j=i Rij ' (X^es mi + 
x®), then y,i = Zi + bi. Thus, for every fixing of R and to, since bi is fixed, the set of those Z such that the 
number of 's which are 1 is bounded by t£ is independent of R and to. This implies that the following 
sampling algorithm returns a uniformly random element of 

• Randomly sample R and m. Compute bi as defined earlier. 

• Let a = \t£] and consider the halfspace g(y) = sign(a — Yli=iUi)- Now, we use Theorem I4T1 
to sample uniformly at random from g~ l {\) and hence draw a uniformly random y from the set 

{y^{o,i} e -.j:Uim<a}. 

• We set Zi = yi + 6j. Output (R, Z, to). 

The guarantee on the running time of the procedure follows simply by using the running time of TheoremRTI 
Similarly, the statistical distance of the output from the uniform distribution on is at most 5. □ 
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7 Efficient inverse approximate uniform generation when approximate uni- 
form generation is infeasible 

In Section [4] we gave an efficient algorithm for the inverse approximate uniform generation problem for half- 
spaces, and in Section [5] we gave a quasi -polynomial time algorithm for the inverse approximate uniform 
generation problem for DNFs. Since both these algorithms follow the standard approach, both crucially 
use efficient algorithms for the corresponding uniform generation problems MKLM89UMS04I1 . In this con- 
text, it is natural to ask the following question: Is inverse approximate uniform generation easy only if the 
corresponding approximate uniform generation problem is easy? 

In this section we show that the answer to this question is "no" (for at least two reasons). First, we point 
out that a negative answer follows easily from the well-known fact that it is computationally hard to "detect 
unique solutions." In more detail, we recall the definition of the UNIQUE-SAT problem. UNIQUE-SAT is 
a promise problem where given a CNF <E>, the task is to distinguish between the following two cases: 

• <I> has no satisfying assignment; versus 

• $ has exactly one satisfying assignment. 

In a famous result, Valiant and Vazirani MVV86I showed the following. 

Theorem 86. HW86\I There is a randomized polynomial time reduction from CNF-SAT to UNIQUE-SAT. 

Let C denote the class of all re-variable CNF formulas that have exactly one satisfying assignment. As 
an immediate corollary of Theorem MVV86I we have the following: 

Corollary 87. There is a constant c > such that unless SAT G BPTIME(t(n)), there is no approximate 
uniform generation algorithm for C which runs in time BPTIME(t(n c )) even for variation distance e = 1/2. 

On the other hand, it is clear that there is a linear time algorithm for the inverse approximate uniform 
generation problem for the class C: simply draw a single example x and output the trivial distribution 
supported on that one example. 

The above simple argument shows that there indeed exist classes C where inverse uniform generation 
is "easy" but approximate uniform generation is "hard", but this example is somewhat unsatisfying, as the 
algorithm for inverse approximate uniform generation is trivial. It is natural to ask the following meta- 
question: is there a class of functions C such that approximation uniform generation is hard, but inverse 
approximate generation is easy because of a polynomial-time algorithm that "uses its samples in a non- 
trivial way?" In the rest of this section we give an example of such a problem. 

Efficient inverse approximate uniform generation for graph automorphism. The following problem is 
more naturally defined in terms of a relation over combinatorial objects rather than in terms of a function 
and its satisfying assignments. Let us define Q n to be the set of all (simple undirected) graphs over vertex set 
[n] and S n to be the symmetric group over [re]. We define the relation Ra, u t{G, cr) over Q n x § n as follows: 
i? aut (G, a) holds if and only if a is an automorphism for the graph G. (Recall that "a is an automorphism 
for graph G" means that (x, y) is an edge in G if and only if (a(x),a(y)) is also an edge in G.) The inverse 
approximate uniform generation problem for the relation i? aut is then as follows: There is an unknown 
n-vertex graph G. The algorithm receives uniformly random samples from the set Aut(G) := {a € S n : 
-Raut(G, a) holds }. On input e, S, with probability 1 — 5 the algorithm must output a sampler whose output 
distribution is e-close to the uniform distribution over Aut(G). 

It is easy to see that Aut(G) is a subgroup of § n , and hence the identity permutation e n must belong to 
Aut(G). To understand the complexity of this problem we recall the graph isomorphism problem: 
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Definition 88. GRAPH-ISOMORPHISM is defined as follows : The input is a pair of graphs G\ , G2 € Q n 
and the goal is to determine whether they are isomorphic. 

While it is known that GRAPH-ISOMORPHISM is unlikely to be NP-complete HSch88llBHZ87l . even 
after several decades of effort the fastest known algorithm for GRAPH-ISOMORPHISM has a running 
time of HBab81i This gives strong empirical evidence that GRAPH-ISOMORPHISM is a compu- 

tationally hard problem. The following claim establishes that approximate uniform generation for i? aut is 
as hard as GRAPH-ISOMORPHISM: 

Claim 89. If there is a t(n)-time algorithm for approximate uniform generation for the relation i? au t (with 
error 1/2), then for some absolute constant c > there is a poly(t(n c ))-time randomized algorithm for 
GRAPH-ISOMORPHISM. 

Proof. Let A be the hypothesized i(n)-time algorithm, so A, run on input (G, 1/2) where G is an ra-node 
graph, returns an element a G Aut(G) drawn from a distribution D that has ^tvC^i ^Aut(G)) < 1/2- 
Given such an algorithm A, it is easy in 0(t(n)) time to determine (with high constant probability of 
correctness) whether or not |Aut(G)| > 1. Now the claim follows from the known fact [Hof82] that there 
is a polynomial-time reduction from GRAPH-ISOMORPHISM to the problem of determining whether an 
input graph has |Aut(G)| > 1. □ 

While approximate uniform generation for i? aut is hard, the next theorem shows that the inverse approx- 
imate uniform generation problem for i? aut is in fact easy: 

Theorem 90. There is a randomized algorithm Af^ with the following property: The algorithm takes as 
input e,5 > 0. Given access to uniform random samples from Aut(G) (where G is an unknown n-node 
graph), Af^ runs in time poly(n, log(l/e), log(l/<5)) and with probability 1 — 5 outputs a sampler C au t 
with the following property : The running time ofC au t is 0(n log n + log(l/e)) and the output distribution 
o/Caut is e-close to the uniform distribution over Aut(G). 

Proof. The central tool in the proof is the following theorem of Alon and Roichman MAR94II : 

Theorem 91. [AR94] Let H be any group and let hi, . . . ,hj~ be chosen uniformly at random from H. 
Consider the set S = U^ =1 {hi, h~ 1 }. Then, for k = 0(log \H\ + log(l/<5)), with probability at least 1 — 5 
the Cayley graph (H, S) has its second largest eigenvalue at most 1/2. 

We now describe our algorithm Af^. On input e, 5 it draws k = 0(n\ogn + log (1/5)) permutations 
gi, . . . ,gj. from Aut(G). It computes g± , . . . ,g^ x and sets S = U^ =1 {^, ff.^ 1 }- The sampler C aut is 
defined as follows: It uses its input random bits to perform a random walk on the Cayley graph (Aut(G), S), 
starting at e n , for T = 0(n log n + log(l/e)) steps; it outputs the element of H which it reaches at the end 
of the walk. (Note that in order to perform this random walk it is not necessary to have Aut(G) explicitly - 
it suffices to explicitly have the set S.) 

The analysis is simple: we first observe that every graph G has an automorphism group of size | Aut(G) | < 
n\. Theorem [9T1 then guarantees that with probability at least 1 — 5 the Cayley graph (Aut(G), S) has its 
second eigenvalue bounded by 1/2. Assuming that the second eigenvalue is indeed at most 1/2, standard 
results in the theory of random walks on graphs imply that the distribution of the location reached at the end 
of the walk has variation distance at most e from the uniform distribution over Aut(G). This concludes the 
proof. □ 
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8 Conclusion and future work 



We have considered inverse problems in approximate uniform generation for a range of interesting and well- 
studied classes of functions including LTFs, DNFs, CNFs, polynomial threshold functions, and more. While 
our findings have determined the computational complexity of inverse approximate uniform generation for 
these classes, several interesting questions and directions remain to be pursued. We outline some of these 
directions below. 

One natural goal is to extend our results (both positive and negative) to a wider range of function classes; 
we list several specific classes that seem particularly worthy of investigation. The first of these is the class of 
intersections of two monotone LTFs. We note that Morris and Sinclair MMS04I1 gave efficient approximate 
uniform generation / counting algorithms for intersections of two monotone LTFs, but on the other hand, 
no distribution independent PAC or SQ learning algorithm is known for this class (although quasipoly (re- 
time algorithms are known if both LTFs have integer weights that are at most poly(n) BKOS0410 . The 
second class is that of poly(n)-size decision trees. Our DNF result gives a quasipoly(n/e)-time inverse 
approximate uniform generation algorithm for this class; can this be improved to poly(n, 1/e)? We note that 
in order to obtain such a result one would presumably have to bypass the "standard approach," since decision 
trees are not known to be PAC learnable faster than quasipoly(n/e)-time under the uniform distribution on 
{— 1, l} n . (We further note that while BFOS081 gives a reduction from learning the uniform distribution over 
satisfying assignments of a decision tree to the problem of PAC learning decision trees under the uniform 
distribution, this reduction relies crucially on the assumption — implicit in the BFOS081 framework — 
that the probability mass function of the hypothesis distribution can be efficiently evaluated on any input 
x £ {—1, l} n - In our framework this assumption need not hold so the [FOS08] reduction does not apply.) 
Still other natural classes to investigate are context free languages (for which quasi-polynomial time uniform 
generation algorithms are known flGJK + 97l ) and various classes of branching programs. It may also be of 
interest to consider similar problems when the underlying measure is (say) Gaussian or log-concave. 

Another interesting direction to pursue is to study inverse approximate uniform generation for combina- 
torial problems like matching and coloring as opposed to the "boolean function satisfying assignment"-type 
problems that have been the main focus of this paper. We note that preliminary arguments suggest that there 
is a simple efficient algorithm for inverse approximate uniform generation of perfect matchings in bipartite 
graphs. Similarly, preliminary arguments suggest that for the range of parameters for which the "forward" 
approximate uniform generation problem for colorings is known to be easy (namely, the number q of allow- 
able colors satisfies q > 11 A/6 where A is the degree [Vig99]), the inverse approximate uniform generation 
problem also admits an efficient algorithm. These preliminary results give rise to the question of whether 
there are similar combinatorial problems for which the complexity of the "forward" approximate uniform 
generation problem is not known and yet we can determine the complexity of inverse approximate uniform 
generation (like the group theoretic setting of Section [7]). 

Finally, for many combinatorial problems, the approximate uniform generation algorithm is to run a 
Markov chain on the state space. In the regimes where the uniform generation problem is hard, the Markov 
chain does not mix rapidly which is in turn equivalent to the existence of sparse cuts in the state space. 
However, an intriguing possibility arises here: If one can show that the state space can be partitioned into 
a small number of components such that each component has no sparse cuts, then given access to a small 
number of random samples from the state space (with at least one such example belonging to each compo- 
nent), one may be able to easily perform approximate uniform generation. Since the inverse approximate 
uniform generation algorithms that we consider have access to random samples, this opens the possibility 
of efficient approximate uniform generation algorithms in such cases. To conclude, we give an example of 
a natural combinatorial problem (from statistical physics) where it seems that this is essentially the situa- 
tion (although we do not have a formal proof). This is the 2-D Ising model, for which the natural Glauber 
dynamics is known to have exponential mixing time beyond the critical temperature [Mar98 ]. On the other 
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hand, it was recently shown that even beyond the critical temperature, if one fixes the boundary to have the 
same spin (all positive or all negative) then the mixing time comes down from exponential to quasipolyno- 
mial MLMSTi While we do not know of a formal reduction, the fact that fixing the boundary to the same 
spin brings down the mixing time of the Glauber dynamics from exponential to quasipolynomial is "morally 
equivalent" to the existence of only a single sparse cut in the state space of the graph MSinl2j Finding other 
such natural examples is an intriguing goal. 
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